skip to main content
research-article

Exploiting HBM on FPGAs for Data Processing

Published:09 December 2022Publication History
Skip Abstract Section

Abstract

Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenges posed by these new use cases, FPGAs are quickly evolving in terms of their capabilities and organization. The utilization of High Bandwidth Memory (HBM) in FPGA devices is one recent example of such a trend. In this article, we study the potential of FPGAs equipped with HBM from a data analytics perspective. We consider three workloads common in analytics-oriented databases and implement them on an FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. We consider two possible configurations of the HBM, using a single and a dual clock version design. With the right design, FPGA+HBM-based solutions are able to surpass the highest performance provided by either a two-socket POWER91 system or a 14-core Xeon2 E5 by up to 5.9× (range selection), 18.3× (hash join), and 6.1× (SGD).

REFERENCES

  1. [1] Intel. 2016. Intel Xeon Processor E5-2690 v4. Retrieved from https://ark.intel.com/content/www/us/en/ark/products/91770/intel-xeon-processor-e5-2690-v4-35m-cache-2-60-ghz.html.Google ScholarGoogle Scholar
  2. [2] AWS. 2017. AWS F1 Instances. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.Google ScholarGoogle Scholar
  3. [3] Oracle. 2017. Oracle Data Mining. Retrieved from https://www.oracle.com/technetwork/database/enterprise-edition/odm-techniques-algorithms-097163.html.Google ScholarGoogle Scholar
  4. [4] Alpha Data. 2019. Alpha Data ADM-PCIE-9H7. Retrieved from https://www.alpha-data.com/dcp/products.php?product=adm-pcie-9h7.Google ScholarGoogle Scholar
  5. [5] IBM. 2019. POWER9 LaGrange Single-Chip Module Datasheet v1.8, OpenPOWER. Retrieved from https://www-50.ibm.com/systems/power/openpower/posting.xhtml?postingId=0646B83F1D410C28852580110015080A.Google ScholarGoogle Scholar
  6. [6] Xilinx. 2019. Xilinx VCU1525. Retrieved from https://www.xilinx.com/support/documentation/boards_and_kits/vcu1525/ug1268-vcu1525-reconfig-accel-platform.pdf.Google ScholarGoogle Scholar
  7. [7] Baidu. 2020. Baidu FPGA Instances. Retrieved from https://cloud.baidu.com/product/fpga.html.Google ScholarGoogle Scholar
  8. [8] IBM. 2020. IBM DB2 Machine Learning. Retrieved from https://www.ibm.com/cloud/garage/dte/tutorial/database-machine-learning-ibm-db2-warehouse-cloud/.Google ScholarGoogle Scholar
  9. [9] Xilinx. 2021. AXI HBM IP Documentation by Xilinx. Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/hbm/v1_0/pg276-axi-hbm.pdf.Google ScholarGoogle Scholar
  10. [10] Xilinx. 2021. New Intel XPU Innovations Target HPC and AI. Retrieved from https://www.intel.com/content/www/us/en/newsroom/news/new-intel-xpu-innovations-target-hpc-ai.html.Google ScholarGoogle Scholar
  11. [11] Xilinx. 2021. UltraScale Architecture-based FPGAs Memory IP. Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf.Google ScholarGoogle Scholar
  12. [12] Xilinx. 2021. Xilinx Ultrascale+ Devices. Retrieved from https://www.xilinx.com/support/documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf.Google ScholarGoogle Scholar
  13. [13] Alonso Gustavo, Istvan Zsolt, Kara Kaan, Owaida Muhsen, and Sidler David. 2019. doppioDB 1.0: Machine learning inside a relational engine. IEEE Data Eng. Bull. 42, 2 (2019), 19–31.Google ScholarGoogle Scholar
  14. [14] Balkesen Cagri, Teubner Jens, Alonso Gustavo, and Özsu M. Tamer. 2013. Main-memory Hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proceedings of the IEEE 29th International Conference on Data Engineering (ICDE’13). IEEE, 362373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Bergstra James and Bengio Yoshua. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13 (Feb. 2012), 281305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Boncz Peter, Neumann Thomas, and Erling Orri. 2013. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In Proceedings of the Technology Conference on Performance Evaluation and Benchmarking. Springer, 6176.Google ScholarGoogle Scholar
  17. [17] Boncz Peter A., Manegold Stefan, Kersten Martin L., et al. 1999. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of the Very Large Data Base Conference (VLDB’99), Vol. 99. 5465.Google ScholarGoogle Scholar
  18. [18] Bubeck Sébastien et al. 2015. Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. 8, 3–4 (2015), 231357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Casper Jared and Olukotun Kunle. 2014. Hardware acceleration of database operations. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays. ACM, 151160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Cheng Xuntao, He Bingsheng, Lo Eric, Wang Wei, Lu Shengliang, and Chen Xinyu. 2019. Deploying Hash tables on die-stacked high bandwidth memory. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 239248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Choi Young-kyu, Chi Yuze, Qiao Weikang, Samardzic Nikola, and Cong Jason. 2021. HBM connect: High-performance HLS interconnect for FPGA HBM. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’21). 116126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Chung Eric, Fowers Jeremy, Ovtcharov Kalin, Papamichael Michael, Caulfield Adrian, Massengill Todd, Liu Ming, Lo Daniel, Alkalay Shlomi, Haselman Michael, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 820.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Cong Jason, Fang Zhenman, Lo Michael, Wang Hanrui, Xu Jingxian, and Zhang Shaochong. 2018. Understanding performance differences of FPGAs and GPUs. In Proceedings of the IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’18). IEEE, 9396.Google ScholarGoogle Scholar
  24. [24] Fang Jian, Mulder Yvo T. B., Hidders Jan, Lee Jinho, and Hofstee H. Peter. 2019. In-memory database acceleration on FPGAs: A survey. VLDB J. (2019), 127.Google ScholarGoogle Scholar
  25. [25] Fowers Jeremy, Kim Joo-Young, Burger Doug, and Hauck Scott. 2015. A scalable high-bandwidth architecture for lossless compression on FPGAs. In Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’15). IEEE, 5259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Fowers Jeremy, Ovtcharov Kalin, Papamichael Michael, Massengill Todd, Liu Ming, Lo Daniel, Alkalay Shlomi, Haselman Michael, Adams Logan, Ghandi Mahdi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE Press, 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Haghi Pouya, Geng Tong, Guo Anqi, Wang Tianqi, and Herbordt Martin. 2020. FP-AMG: FPGA-based acceleration framework for algebraic multigrid solvers. In Proceedings of the IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’20). IEEE, 148156.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Huang Hongjing, Wang Zeke, Zhang Jie, He Zhenhao, Wu Chao, Xiao Jun, and Alonso Gustavo. 2021. Shuhai: A tool for benchmarking HighBandwidth memory on FPGAs. IEEE Trans. Comput. (2021). .Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Idreos Stratos, Groffen Fabian, Nes Niels, Manegold Stefan, Mullender K. Sjoerd, and Kersten Martin L.. 2012. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35, 1 (2012), 4045.Google ScholarGoogle Scholar
  30. [30] Jaggi Martin, Smith Virginia, Takác Martin, Terhorst Jonathan, Krishnan Sanjay, Hofmann Thomas, and Jordan Michael I.. 2014. Communication-efficient distributed dual coordinate ascent. In Advances in Neural Information Processing Systems. 30683076.Google ScholarGoogle Scholar
  31. [31] Jiang Wenqi, He Zhenhao, Zhang Shuai, Preußer Thomas B., Zeng Kai, Feng Liang, Zhang Jiansong, Liu Tongxuan, Li Yong, Zhou Jingren, and others. 2021. MicroRec: Efficient recommendation inference by hardware and data structure solutions. Proc. Mach. Learn. Syst. 3, 1 (2021), 845–859.Google ScholarGoogle Scholar
  32. [32] Jiang Wenqi, He Zhenhao, Zhang Shuai, Zeng Kai, Feng Liang, Zhang Jiansong, Liu Tongxuan, Li Yong, Zhou Jingren, Zhang Ce, et al. 2021. FleetRec: Large-scale recommendation inference on hybrid GPU-FPGA clusters. In Proceedings of the 27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’21).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). IEEE, 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Kaldewey Tim, Lohman Guy, Mueller Rene, and Volk Peter. 2012. GPU join processing revisited. In Proceedings of the 8th International Workshop on Data Management on New Hardware. ACM, 5562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Kara Kaan, Alistarh Dan, Alonso Gustavo, Mutlu Onur, and Zhang Ce. 2017. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. In Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). IEEE, 160167.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Kara Kaan and Alonso Gustavo. 2016. Fast and robust hashing for database operators. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16). IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Kara Kaan, Eguro Ken, Zhang Ce, and Alonso Gustavo. 2018. ColumnML: Column-store machine learning with on-the-fly data transformation. Proc. VLDB Endow. 12, 4 (2018), 348361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Kara Kaan, Giceva Jana, and Alonso Gustavo. 2017. FPGA-based data partitioning. In Proceedings of the ACM International Conference on Management of Data. ACM, 433445.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Kara Kaan, Hagleitner Christoph, Diamantopoulos Dionysios, Syrivelis Dimitris, and Alonso Gustavo. 2020. High bandwidth memory on FPGAs: A data analytics perspective. In Proceedings of the 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Kara Kaan, Wang Zeke, Zhang Ce, and Alonso Gustavo. 2019. doppioDB 2.0: Hardware techniques for improved integration of machine learning into databases. Proc. VLDB Endow. 12, 12 (2019), 18181821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Lu Alec, Fang Zhenman, Liu Weihua, and Shannon Lesley. 2021. Demystifying the memory system of modern datacenter FPGAs for software programmers through microbenchmarking. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’21). 105115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] MacGregor John. 2013. Predictive Analysis with SAP: The Comprehensive Guide. SAP Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Mahajan Divya, Kim Joon Kyung, Sacks Jacob, Ardalan Adel, Kumar Arun, and Esmaeilzadeh Hadi. 2018. In-RDBMS hardware acceleration of advanced analytics. Proc. VLDB Endow. 11, 11 (2018), 13171331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Mashimo Susumu, Chu Thiem Van, and Kise Kenji. 2017. High-performance hardware merge sorter. In Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Miao Hongyu, Jeon Myeongjae, Pekhimenko Gennady, McKinley Kathryn S., and Lin Felix Xiaozhu. 2019. StreamBox-HBM: Stream analytics on high bandwidth hybrid memory. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, 167181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Oliver Neal, Sharma Rahul R., Chang Stephen, Chitlur Bhushan, Garcia Elkin, Grecco Joseph, Grier Aaron, Ijih Nelson, Liu Yaping, Marolia Pratik, et al. 2011. A reconfigurable computing system based on a cache-coherent fabric. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. IEEE, 8085.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Owaida Muhsen, Zhang Hantian, Zhang Ce, and Alonso Gustavo. 2017. Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Papaphilippou Philippos and Luk Wayne. 2018. Accelerating database systems using FPGAs: A survey. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 1251255.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Pohl Constantin, Sattler Kai-Uwe, and Graefe Goetz. 2019. Joins on high-bandwidth memory: A new level in the memory hierarchy. VLDB J. (2019), 121.Google ScholarGoogle Scholar
  50. [50] Putnam Andrew, Caulfield Adrian M., Chung Eric S., Chiou Derek, Constantinides Kypros, Demme John, Esmaeilzadeh Hadi, Fowers Jeremy, Gopal Gopi Prashanth, Gray Jan, et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. ACM SIGARCH Comput. Architect. News 42, 3 (2014), 1324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Ruiz Mario, Sidler David, Sutter Gustavo, Alonso Gustavo, and López-Buedo Sergio. 2019. Limago: An FPGA-based open-source 100 GbE TCP/IP stack. In Proceedings of the 29th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, 286292.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Sgherzi Francesco, Parravicini Alberto, Siracusa Marco, and Santambrogio Marco D.. 2021. Solving large top-K graph eigenproblems with a memory and compute-optimized FPGA design. In Proceedings of the IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). IEEE, 7887.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Sidler David, István Zsolt, Owaida Muhsen, and Alonso Gustavo. 2017. Accelerating pattern matching queries in hybrid CPU-FPGA architectures. In Proceedings of the ACM International Conference on Management of Data. ACM, 403415.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Singh Gagandeep, Diamantopoulos Dionysios, Hagleitner Christoph, Gómez-Luna Juan, Stuijk Sander, Mutlu Onur, and Corporaal Henk. 2020. NERO: A near high-bandwidth memory stencil accelerator for weather prediction modeling. In Proceedings of the 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, 917.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Sodani Avinash, Gramunt Roger, Corbal Jesus, Kim Ho-Seop, Vinod Krishna, Chinthamani Sundaram, Hutsell Steven, Agarwal Rajat, and Liu Yen-Chen. 2016. Knights landing: Second-generation Intel Xeon Phi product. IEEE Micro 36, 2 (2016), 3446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Stuecheli Jeffrey, Starke William J., Irish John D., Arimilli L. Baba, Dreps D., Blaner Bart, Wollbrink Curt, and Allison Brian. 2018. IBM POWER9 opens up a new era of acceleration enablement: OpenCAPI. IBM J. Res. Dev. 62, 4/5 (2018), 8–1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Szegedy Christian, Vanhoucke Vincent, Ioffe Sergey, Shlens Jon, and Wojna Zbigniew. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28182826.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Umuroglu Yaman, Fraser Nicholas J., Gambardella Giulio, Blott Michaela, Leong Philip, Jahre Magnus, and Vissers Kees. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, 6574.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Wang Zeke, Huang Hongjing, Zhang Jie, and Alonso Gustavo. 2020. Shuhai: Benchmarking high bandwidth memory on FPGAs. In Proceedings of the IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’20). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Wang Zeke, Kara Kaan, Zhang Hantian, Alonso Gustavo, Mutlu Onur, and Zhang Ce. 2019. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning. Proc. VLDB Endow. 12, 7 (2019), 807821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Weis Christian, Wehn Norbert, Igor Loi, and Benini Luca. 2011. Design space exploration for 3D-stacked DRAMs. In Proceedings of the Design, Automation and Test in Europe (DATE’11). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Woods Louis, István Zsolt, and Alonso Gustavo. 2014. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proc. VLDB Endow. 7, 11 (2014), 963974.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting HBM on FPGAs for Data Processing

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Reconfigurable Technology and Systems
              ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 4
              December 2022
              476 pages
              ISSN:1936-7406
              EISSN:1936-7414
              DOI:10.1145/3540252
              • Editor:
              • Deming Chen
              Issue’s Table of Contents

              Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 December 2022
              • Online AM: 9 February 2022
              • Accepted: 12 October 2021
              • Revised: 14 September 2021
              • Received: 30 June 2021
              Published in trets Volume 15, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed
            • Article Metrics

              • Downloads (Last 12 months)437
              • Downloads (Last 6 weeks)39

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!