Abstract
Existing near-data processing (NDP)-powered architectures have demonstrated their strength for some data-intensive applications. Data center servers, however, have to serve not only data-intensive but also compute-intensive applications. An in-depth understanding of the impact of NDP on various data center applications is still needed. For example, can a compute-intensive application also benefit from NDP? In addition, current NDP techniques focus on maximizing the data processing rate by always utilizing all computing resources at all times. Is this “always running in full gear” strategy consistently beneficial for an application? To answer these questions, we first propose two reconfigurable NDP-powered servers called RANS (Reconfigurable ARM-based NDP Server) and RFNS (Reconfigurable FPGA-based NDP Server). Next, we implement a single-engine prototype for each of them based on a conventional data center and then evaluate their effectiveness. Experimental results measured from the two prototypes are then extrapolated to estimate the properties of the two full-size reconfigurable NDP servers. Finally, several new findings are presented. For example, we find that while RANS can only benefit data-intensive applications, RFNS can offer benefits for both data-intensive and compute-intensive applications. Moreover, we find that for certain applications the reconfigurability of RANS/RFNS can deliver noticeable energy efficiency without any performance degradation.
- [1] . 2016. Slicer: Auto-sharding for datacenter applications. In 11th USENIX Symposium on Operating Systems Design and Implementation
(OSDI'16) . 739–753. Google ScholarDigital Library
- [2] . 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105–117. Google Scholar
Digital Library
- [3] . 2014. Firebox: A hardware building block for 2020 warehouse-scale computers. In 12th USENIX Conference on File and Storage Technologies
(FAST'14) . Keynote presentation.Google Scholar - [4] . 2007. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
(SIGMETRICS'07) .ACM , 289–300. Google ScholarDigital Library
- [5] . 2014. Near-data processing: Insights from a micro-46 workshop. IEEE Micro 34, 4 (2014), 36–42.Google Scholar
Cross Ref
- [6] . 2017.
It's time to think about an operating system for near data processing architectures . In Workshop on Hot Topics in Operating Systems. ACM, 56–61. Google ScholarDigital Library
- [7] . 2017. Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Transactions on Automation Science and Engineering 14, 2 (2017), 1172–1184.Google Scholar
Cross Ref
- [8] . 2017. LazyPIM: An efficient cache coherence mechanism for processing-in-memory. IEEE Computer Architecture Letters 16, 1 (2017), 46–50.Google Scholar
Digital Library
- [9] . 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314–347.Google Scholar
Cross Ref
- [10] . 2013. Active disk meets flash: A case for intelligent SSDs. In 27th International ACM Conference on International Conference on Supercomputing
(ICS'13) .ACM , 91–102. Google ScholarDigital Library
- [11] . 2015. Design tradeoffs of SSDs: From energy consumption's perspective. ACM Transactions on Storage (TOS) 11, 2 (2015), 8. Google Scholar
Digital Library
- [12] . 2015. AllWinner A64 a quad core 64-bit ARM cortex A53 SoC for tablets. https://www.cnx-software.com/2015/01/08/allwinner-a64-is-a-5-quad-core-64-bit-arm-cortex-a53-soc-for-tablets/z.Google Scholar
- [13] . 2006. Data-centric Computing with the Netezza Architecture.
Technical Report . Sandia National Laboratories.Google Scholar - [14] . 2013. Minerva: Accelerating data analysis in next-generation SSDs. In Field-Programmable Custom Computing Machines
(FCCM'13) .IEEE , 9–16. Google ScholarDigital Library
- [15] . 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009
(CVPR'09). IEEE , 248–255.Google ScholarCross Ref
- [16] . 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data.
ACM , 1221–1230. Google ScholarDigital Library
- [17] . [n. d.].
Mapreduce for data intensive scientific analyses . In eScience'08. IEEE, 277–284. Google ScholarDigital Library
- [18] 2017. Fidus Sidewinder-100. https://www.xilinx.com/products/boards-and-kits/1-o1x8yv.html.Google Scholar
- [19] . 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation
(PACT'15) .IEEE , 113–124. Google ScholarDigital Library
- [20] . 2016. Biscuit: A framework for near-data processing of big data workloads. In 43rd Annual International Symposium on Computer Architecture
(ISCA'16) .ACM/IEEE , 153–165. Google ScholarDigital Library
- [21] . 2008. The realization of FFT algorithm based on FPGA co-processor. In Second International Symposium on Intelligent Information Technology Application, 2008
(IITA'08). Vol. 3.IEEE , 239–243. Google ScholarDigital Library
- [22] . 2018. PI3DBS16212, 2:1 Mux/De-Mux Switch. https://www.diodes.com/assets/Databriefs/PI3DBS16212-Product-Brief.pdf.Google Scholar
- [23] . 2017. Intel® Xeon® Gold 6154 Processor. https://ark.intel.com/products/120495/Intel-Xeon-Gold-6154-Processor-24_75M-Cache-3_00-GHz.Google Scholar
- [24] . 2017. Caribou: Intelligent distributed storage. Proceedings of the VLDB Endowment 10, 11 (2017), 1202–1213. Google Scholar
Digital Library
- [25] . 2016. YourSQL: A high-performance database system leveraging in-storage computing. Proceedings of the VLDB Endowment 9, 12 (2016), 924–935. Google Scholar
Digital Library
- [26] . 2015. BlueDBM: An appliance for big data analytics. In 42nd Annual International Symposium on Computer Architecture
(ISCA'15) .ACM/IEEE , 1–13. Google ScholarDigital Library
- [27] . 2013.
Simulation and performance analysis of data intensive and workload intensive cloud computing data centers . In Optical Interconnects for Future Data Center Networks. Springer, 47–63.Google ScholarCross Ref
- [28] . 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture.
ACM , 219–231. Google ScholarDigital Library
- [29] . 2003. PCI express and advanced switching: Evolutionary path to building next generation interconnects. In Proceedings of the 11th Symposium on High Performance Interconnects, 2003.
IEEE , 21–29.Google ScholarCross Ref
- [30] . 2007. Failure trends in a large disk drive population. In 5th USENIX Conference on File and Storage Technologies
(FAST'07) .USENIX Association . Google ScholarDigital Library
- [31] . 2000. The phoenix framework: A practical architecture for programmable networks. IEEE Communications Magazine 38, 3 (2000), 160–165. Google Scholar
Digital Library
- [32] . 2009. Rodinia: Accelerating compute-intensive applications with accelerators. http://www.cs.virginia.edu/ skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Acceleratorsz.Google Scholar
- [33] . 2016. SmartSSD® Computational Storage Drive. https://samsungsemiconductor-us.com/smartssd//.Google Scholar
- [34] . 2017. Mission Peak NGSFF All Flash NVMe Reference Design. http://www.samsung.com/semiconductor/insights/tech-leadership/mission-peak-ngsff-all-flash-nvme-reference-design/.Google Scholar
- [35] . 2014. Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions. Proceedings of the VLDB Endowment 7, 12 (2014), 1035–1046. Google Scholar
Digital Library
- [36] . 2019.
A near-data processing server architecture and its impact on data center applications . In International Conference on High Performance Computing. Springer, 81–98.Google ScholarCross Ref
- [37] . 2018. RISP: A reconfigurable in-storage processing framework with energy-awareness. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
(CCGRID'18) .IEEE , 193–202. Google ScholarDigital Library
- [38] . 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing. Proceedings of the VLDB Endowment 8 (2014), 245–256. Google Scholar
Digital Library
- [39] . 2011. Phoenix++: Modular mapreduce for shared-memory systems. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications.
ACM , 9–16. Google ScholarDigital Library
- [40] . 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In 11th USENIX Conference on File and Storage Technologies
(FAST'13) . 119–132. Google ScholarDigital Library
- [41] . 2014.
Intel math kernel library . In High-Performance Computing on the Intel® Xeon Phi\(^{™}\). Springer, 167–188. Google ScholarDigital Library
- [42] . 2014. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963–974. Google Scholar
Digital Library
- [43] . 2016. Big data analytics = machine learning + cloud computing. arXiv preprint:1601.03115 (2016).Google Scholar
- [44] . 2011. ScalaExtrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming
(PPoPP'11) .ACM , 113–122. Google ScholarDigital Library
- [45] . 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20–24. Google Scholar
Digital Library
- [46] . 2017. Xilinx Xilinx Virtex UltraScale+ FPGA VCU1525. https://www.xilinx.com/products/boards-and-kits/vcu1525-a.html.Google Scholar
- [47] . 2017. Pipelined parallel join and its FPGA-based acceleration. ACM Transactions on Reconfigurable Technology and Systems
(TRETS) 10, 4 (2017), 28. Google ScholarDigital Library
- [48] . 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In the 23rd International Symposium on High-performance Parallel and Distributed Computing
(HPDC'14) .ACM , 85–98. Google ScholarDigital Library
Index Terms
Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications
Recommendations
Partitioning signal processing applications to different granularity reconfigurable logic
SSIP'05: Proceedings of the 5th WSEAS international conference on Signal, speech and image processingIn this paper, we propose a methodology for partitioning DSP applications between the fine and coarse-grain reconfigurable hardware for improving performance. The fine-grain logic is implemented by an embedded FPGA unit, while for the coarse-grain ...
Rapid Implementation of Embedded Systems using Xilinx Zynq Platform
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media ConferenceIn any digital system design, it is crucial to achieve the lowest time-to-market possible. Indeed, that need has pushed large FPGA manufacturers to produce SoCs which will implement reprogrammable logic along with CPU and DSP cores. Especially, during ...
Accelerating Big Data Analytics Using FPGAs
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing MachinesEmerging big data analytics applications require a significant amount of server computational power. As chips are hitting power limits, computing systems are moving away from general-purpose designs and toward greater specialization. Hardware ...






Comments