skip to main content
research-article

Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Existing near-data processing (NDP)-powered architectures have demonstrated their strength for some data-intensive applications. Data center servers, however, have to serve not only data-intensive but also compute-intensive applications. An in-depth understanding of the impact of NDP on various data center applications is still needed. For example, can a compute-intensive application also benefit from NDP? In addition, current NDP techniques focus on maximizing the data processing rate by always utilizing all computing resources at all times. Is this “always running in full gear” strategy consistently beneficial for an application? To answer these questions, we first propose two reconfigurable NDP-powered servers called RANS (Reconfigurable ARM-based NDP Server) and RFNS (Reconfigurable FPGA-based NDP Server). Next, we implement a single-engine prototype for each of them based on a conventional data center and then evaluate their effectiveness. Experimental results measured from the two prototypes are then extrapolated to estimate the properties of the two full-size reconfigurable NDP servers. Finally, several new findings are presented. For example, we find that while RANS can only benefit data-intensive applications, RFNS can offer benefits for both data-intensive and compute-intensive applications. Moreover, we find that for certain applications the reconfigurability of RANS/RFNS can deliver noticeable energy efficiency without any performance degradation.

REFERENCES

  1. [1] Adya A., Myers D., Howell J., Elson J., Meek C., Khemani V., Fulger Stefan, Gu Pan, Bhuvanagiri Lakshminath, Hunter Jason, Peon R., Kai Larry, Shraer A., Merchant Arif, and Lev-Ari Kfir. 2016. Slicer: Auto-sharding for datacenter applications. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). 739753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ahn Junwhan, Hong Sungpack, Yoo Sungjoo, Mutlu Onur, and Choi Kiyoung. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Asanovic Krste and Patterson David. 2014. Firebox: A hardware building block for 2020 warehouse-scale computers. In 12th USENIX Conference on File and Storage Technologies (FAST'14). Keynote presentation.Google ScholarGoogle Scholar
  4. [4] Bairavasundaram Lakshmi N., Goodson Garth R., Pasupathy Shankar, and Schindler Jiri. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07). ACM, 289300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Balasubramonian Rajeev, Chang Jichuan, Manning Troy, Moreno Jaime H., Murphy Richard, Nair Ravi, and Swanson Steven. 2014. Near-data processing: Insights from a micro-46 workshop. IEEE Micro 34, 4 (2014), 3642.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Barbalace Antonio, Iliopoulos Anthony, Rauchfuss Holm, and Brasche Goetz. 2017. It's time to think about an operating system for near data processing architectures. In Workshop on Hot Topics in Operating Systems. ACM, 5661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Bi Jing, Yuan Haitao, Tan Wei, Zhou MengChu, Fan Yushun, Zhang Jia, and Li Jianqiang. 2017. Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Transactions on Automation Science and Engineering 14, 2 (2017), 11721184.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Boroumand Amirali, Ghose Saugata, Patel Minesh, Hassan Hasan, Lucia Brandon, Hsieh Kevin, Malladi Krishna T., Zheng Hongzhong, and Mutlu Onur. 2017. LazyPIM: An efficient cache coherence mechanism for processing-in-memory. IEEE Computer Architecture Letters 16, 1 (2017), 4650.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen C. L. Philip and Zhang Chun-Yang. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314347.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cho Sangyeun, Park Chanik, Oh Hyunok, Kim Sungchan, Yi Youngmin, and Ganger Gregory R.. 2013. Active disk meets flash: A case for intelligent SSDs. In 27th International ACM Conference on International Conference on Supercomputing (ICS'13). ACM, 91102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Cho Seokhei, Park Changhyun, Won Youjip, Kang Sooyong, Cha Jaehyuk, Yoon Sungroh, and Choi Jongmoo. 2015. Design tradeoffs of SSDs: From energy consumption's perspective. ACM Transactions on Storage (TOS) 11, 2 (2015), 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] CNXSoft. 2015. AllWinner A64 a quad core 64-bit ARM cortex A53 SoC for tablets. https://www.cnx-software.com/2015/01/08/allwinner-a64-is-a-5-quad-core-64-bit-arm-cortex-a53-soc-for-tablets/z.Google ScholarGoogle Scholar
  13. [13] Davidson George S., Cowie Jim R., Helmreich Stephen C., Zacharski Ron A., and Boyack Kevin W.. 2006. Data-centric Computing with the Netezza Architecture. Technical Report. Sandia National Laboratories.Google ScholarGoogle Scholar
  14. [14] De Arup, Gokhale Maya, Gupta Rajesh, and Swanson Steven. 2013. Minerva: Accelerating data analysis in next-generation SSDs. In Field-Programmable Custom Computing Machines (FCCM'13). IEEE, 916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR'09). IEEE, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Do Jaeyoung, Kee Yang-Suk, Patel Jignesh M., Park Chanik, Park Kwanghyun, and DeWitt David J.. 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 12211230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Ekanayake Jaliya, Pallickara Shrideep, and Fox Geoffrey. [n. d.]. Mapreduce for data intensive scientific analyses. In eScience'08. IEEE, 277284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Fidus Systems Inc. 2017. Fidus Sidewinder-100. https://www.xilinx.com/products/boards-and-kits/1-o1x8yv.html.Google ScholarGoogle Scholar
  19. [19] Gao Mingyu, Ayers Grant, and Kozyrakis Christos. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT'15). IEEE, 113124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Gu Boncheol, Yoon Andre S., Bae Duck-Ho, Jo Insoon, Lee Jinyoung, Yoon Jonghyun, Kang Jeong-Uk, Yoon Kwon Chanho, Cho Sangyeun, Jeong Jaeheon, and Chang Duckhyun. 2016. Biscuit: A framework for near-data processing of big data workloads. In 43rd Annual International Symposium on Computer Architecture (ISCA'16). ACM/IEEE, 153165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] He Hongjiang and Guo Hui. 2008. The realization of FFT algorithm based on FPGA co-processor. In Second International Symposium on Intelligent Information Technology Application, 2008 (IITA'08). Vol. 3. IEEE, 239243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] DIODES Incorporated. 2018. PI3DBS16212, 2:1 Mux/De-Mux Switch. https://www.diodes.com/assets/Databriefs/PI3DBS16212-Product-Brief.pdf.Google ScholarGoogle Scholar
  23. [23] Intel. 2017. Intel® Xeon® Gold 6154 Processor. https://ark.intel.com/products/120495/Intel-Xeon-Gold-6154-Processor-24_75M-Cache-3_00-GHz.Google ScholarGoogle Scholar
  24. [24] István Zsolt, Sidler David, and Alonso Gustavo. 2017. Caribou: Intelligent distributed storage. Proceedings of the VLDB Endowment 10, 11 (2017), 12021213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Jo Insoon, Bae Duck-Ho, Yoon Andre S., Kang Jeong-Uk, Cho Sangyeun, Lee Daniel D. G., and Jeong Jaeheon. 2016. YourSQL: A high-performance database system leveraging in-storage computing. Proceedings of the VLDB Endowment 9, 12 (2016), 924935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Jun Sang-Woo, Liu Ming, Lee Sungjin, Hicks Jamey, Ankcorn John, King Myron, Xu Shuotao, and Arvind. 2015. BlueDBM: An appliance for big data analytics. In 42nd Annual International Symposium on Computer Architecture (ISCA'15). ACM/IEEE, 113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Kliazovich Dzmitry, Bouvry Pascal, and Khan Samee Ullah. 2013. Simulation and performance analysis of data intensive and workload intensive cloud computing data centers. In Optical Interconnects for Future Data Center Networks. Springer, 4763.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Koo Gunjae, Matam Kiran Kumar, Narra H. V., Li Jing, Tseng Hung-Wei, Swanson Steven, Annavaram Murali, et al. 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 219231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Mayhew David and Krishnan Venkata. 2003. PCI express and advanced switching: Evolutionary path to building next generation interconnects. In Proceedings of the 11th Symposium on High Performance Interconnects, 2003. IEEE, 2129.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Pinheiro Eduardo, Weber Wolf-Dietrich, and Barroso Luiz André. 2007. Failure trends in a large disk drive population. In 5th USENIX Conference on File and Storage Technologies (FAST'07). USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Putzolu David, Bakshi Sanjay, Yadav Satyendra, and Yavatkar Raj. 2000. The phoenix framework: A practical architecture for programmable networks. IEEE Communications Magazine 38, 3 (2000), 160165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Rodinia. 2009. Rodinia: Accelerating compute-intensive applications with accelerators. http://www.cs.virginia.edu/ skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Acceleratorsz.Google ScholarGoogle Scholar
  33. [33] Samsung. 2016. SmartSSD® Computational Storage Drive. https://samsungsemiconductor-us.com/smartssd//.Google ScholarGoogle Scholar
  34. [34] Samsung. 2017. Mission Peak NGSFF All Flash NVMe Reference Design. http://www.samsung.com/semiconductor/insights/tech-leadership/mission-peak-ngsff-all-flash-nvme-reference-design/.Google ScholarGoogle Scholar
  35. [35] Serafini Marco, Mansour Essam, Aboulnaga Ashraf, Salem Kenneth, Rafiq Taha, and Farooq Minhas Umar. 2014. Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions. Proceedings of the VLDB Endowment 7, 12 (2014), 10351046. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Song Xiaojia, Xie Tao, and Fischer Stephen. 2019. A near-data processing server architecture and its impact on data center applications. In International Conference on High Performance Computing. Springer, 8198.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Song Xiaojia, Xie Tao, and Pan Wen. 2018. RISP: A reconfigurable in-storage processing framework with energy-awareness. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID'18). IEEE, 193202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Taft Rebecca, Mansour E., Serafini M., Duggan J., Elmore Aaron J., Aboulnaga A., Pavlo A., and Stonebraker M.. 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing. Proceedings of the VLDB Endowment 8 (2014), 245256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Talbot Justin, Yoo Richard M., and Kozyrakis Christos. 2011. Phoenix++: Modular mapreduce for shared-memory systems. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications. ACM, 916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Tiwari Devesh, Boboila Simona, Vazhkudai Sudharshan S., Kim Youngjae, Ma Xiaosong, Desnoyers Peter, and Solihin Yan. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In 11th USENIX Conference on File and Storage Technologies (FAST'13). 119132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Wang Endong, Zhang Qing, Shen Bo, Zhang Guangyong, Lu Xiaowei, Wu Qing, and Wang Yajuan. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi\(^{™}\). Springer, 167188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Woods Louis, István Zsolt, and Alonso Gustavo. 2014. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Wu Caesar, Buyya Rajkumar, and Ramamohanarao Kotagiri. 2016. Big data analytics = machine learning + cloud computing. arXiv preprint:1601.03115 (2016).Google ScholarGoogle Scholar
  44. [44] Wu Xing and Mueller Frank. 2011. ScalaExtrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM, 113122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wulf Wm A. and McKee Sally A.. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 2024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Xilinx. 2017. Xilinx Xilinx Virtex UltraScale+ FPGA VCU1525. https://www.xilinx.com/products/boards-and-kits/vcu1525-a.html.Google ScholarGoogle Scholar
  47. [47] Yoshimi Masato, Oge Yasin, and Yoshinaga Tsutomu. 2017. Pipelined parallel join and its FPGA-based acceleration. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10, 4 (2017), 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Zhang Dongping, Jayasena Nuwan, Lyashevsky Alexander, Greathouse Joseph L., Xu Lifan, and Ignatowski Michael. 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC'14). ACM, 8598. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 17, Issue 4
            November 2021
            201 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/3487989
            • Editor:
            • Sam H. Noh
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 October 2021
            • Accepted: 1 April 2021
            • Revised: 1 March 2021
            • Received: 1 July 2020
            Published in tos Volume 17, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!