Abstract
We present Hipernetch, a novel FPGA-based design for performing high-bandwidth network switching. FPGAs have recently become more popular in data centers due to their promising capabilities for a wide range of applications. With the recent surge in transceiver bandwidth, they could further benefit the implementation and refinement of network switches used in data centers. Hipernetch replaces the crossbar with a “combined parallel round-robin arbiter”. Unlike a crossbar, the combined parallel round-robin arbiter is easy to pipeline, and does not require centralised iterative scheduling algorithms that try to fit too many steps in a single or a few FPGA cycles. The result is a network switch implementation on FPGAs operating at a high frequency and with a low port-to-port latency. Our proposed Hipernetch architecture additionally provides a competitive switching performance approaching output-queued crossbar switches. Our implemented Hipernetch designs exhibit a throughput that exceeds 100 Gbps per port for switches of up to 16 ports, reaching an aggregate throughput of around 1.7 Tbps.
- [1] . 2020. OMNeT++ Discrete Event Simulator. Retrieved September 21, 2021 from https://omnetpp.org/.Google Scholar
- [2] . 2015. Take the highway: Design for embedded NoCs on FPGAs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 98–107. Google Scholar
Digital Library
- [3] . 1989. A high-performance switch fabric for integrated circuit and packet switching. International Journal of Digital & Analog Cabled Systems 2, 4 (1989), 277–287.Google Scholar
Cross Ref
- [4] . 2008. A scalable, commodity data center network architecture. SIGCOMM Computer Communication Review 38, 4 (
Aug. 2008), 63–74. https://doi.org/10.1145/1402946.1402967Google ScholarDigital Library
- [5] . 2013. pFabric: Minimal near-optimal datacenter transport. In Proceedings of the ACM SIGCOMM Conference (SIGCOMM’13). Google Scholar
Digital Library
- [6] . 1993. High-speed switch scheduling for local-area networks. ACM Transactions on Computer Systems 11, 4 (1993), 319–352. Google Scholar
Digital Library
- [7] . 2019. 7800R3 Series Data Center Switch Router Data Sheet. Arista Networks, Inc.Google Scholar
- [8] . 1968. Sorting networks and their applications. In Proceedings of the Spring Joint Computer Conference. ACM, New York, NY, 307–314. Google Scholar
Digital Library
- [9] . 2014. Efficient and programmable Ethernet switching with a NoC-enhanced FPGA. In Proceedings of the 10th ACM/IEEE Symposium on Architectures for Networking and Communications Systems. ACM, New York, NY, 89–100. Google Scholar
Digital Library
- [10] . 2014. Optimal sorting networks. In Proceedings of the International Conference on Language and Automata Theory and Applications. 236–247. Google Scholar
Digital Library
- [11] . 2016. A cloud-scale acceleration architecture. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, 1–13. Google Scholar
Digital Library
- [12] . 2000. Saturn: A terabit packet switch using dual round robin. IEEE Communications Magazine 38, 12 (2000), 78–84. Google Scholar
Digital Library
- [13] . 1999. Matching output queueing with a combined input/output-queued switch. IEEE Journal on Selected Areas in Communications 17, 6 (1999), 1030–1039. Google Scholar
Digital Library
- [14] . 2020. Compare Models. Retrieved September 21, 2021 from https://www.cisco.com/c/en/us/products/switches/nexus-3000-series-switches/models-comparison.html#tab-nexus3200.Google Scholar
- [15] . 2020. Cisco Nexus 3000 Series Switches: Compare Models. Retrieved September 21, 2021 from https://www.cisco.com/c/en/us/products/switches/nexus-3000-series-switches/models-comparison.html#tab-nexus3500.Google Scholar
- [16] . 2020. Cisco Nexus 3264Q Switch Data Sheet. Retrieved September 21, 2021 from https://www.cisco.com/c/en/us/products/collateral/switches/nexus-3264q-switch/datasheet-c78-734905.html.Google Scholar
- [17] . 1953. A study of non-blocking switching networks. Bell System Technical Journal 32, 2 (1953), 406–424.Google Scholar
Cross Ref
- [18] . 2012. Saturating the transceiver bandwidth: Switch fabric design on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 67–76. Google Scholar
Digital Library
- [19] . 2020. FlexBex: A RISC-V with a reconfigurable instruction extension. In Proceedings of the 2020 International Conference on Field-Programmable Technology (ICFPT’20).Google Scholar
- [20] . 2015. A scalable high-bandwidth architecture for lossless compression on FPGAs. In Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA, 52–59. Google Scholar
Digital Library
- [21] . 2018. Scheduling algorithms for high performance network switching on FPGAs: A survey. In Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT’18). IEEE, Los Alamitos, CA, 166–173.Google Scholar
Cross Ref
- [22] . 1999. Designing and implementing a fast crossbar scheduler. IEEE Micro 19, 1 (1999), 20–28. Google Scholar
Digital Library
- [23] . 2011. D-LQF: An efficient distributed scheduling algorithm for input-queued switches. In Proceedings of the 2011 IEEE International Conference on Communications (ICC’11). IEEE, Los Alamitos, CA, 1–5.Google Scholar
Cross Ref
- [24] . 1986. Data parallel algorithms. Communications of the ACM 29, 12 (1986), 1170–1183. Google Scholar
Digital Library
- [25] . 2004. A 21.54 Gbits/s fully pipelined AES processor on FPGA. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA, 308–309. Google Scholar
Digital Library
- [26] . 2000. Analysis of an Equal-Cost Multi-Path Algorithm. Retrieved September 21, 2021 from https://tools.ietf.org/html/rfc2992#::text=Abstract%20Equal%2Dcost%20multi%2Dpath,method%20for%20making%20that%20decision. Google Scholar
Digital Library
- [27] . 2009. The crosspoint-queued switch. In Proceedings of IEEE INFOCOM 2009. IEEE, Los Alamitos, CA, 729–737.Google Scholar
Cross Ref
- [28] . 2005. Microarchitecture of a high-radix router. ACM SIGARCH Computer Architecture News 33 (2005), 420–431. Google Scholar
Digital Library
- [29] . 2021. FABulous: An embedded FPGA framework. In Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 45–56. Google Scholar
Digital Library
- [30] . 2007. Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems 26, 2 (2007), 203–215. Google Scholar
Digital Library
- [31] . 2010. FPGA implementation of pipelined 2D-DCT and quantization architecture for JPEG image compression. In Proceedings of the 2010 International Symposium on Information Technology, Vol. 1. IEEE, Los Alamitos, CA, 1–6.Google Scholar
Cross Ref
- [32] . 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100, 10 (1985), 892–901. Google Scholar
Digital Library
- [33] . 2001. On the performance of a dual round-robin switch. In Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Society (INFOCOM’01), Vol. 3. IEEE, Los Alamitos, CA, 1688–1697.Google Scholar
- [34] . 2002. The dual round robin matching switch with exhaustive service. In Proceedings of the Workshop on High Performance Switching and Routing, Merging Optical and IP Technologie. IEEE, Los Alamitos, CA, 58–63.Google Scholar
- [35] 2020. N8500-48B6C 48-Port 25Gb SFP28 L3 Trident 3 Data Centre Managed Ethernet Switch. Retrieved September 21, 2021 from https://www.fs.com/uk/products/75807.html.Google Scholar
- [36] . 2003. Output-queued switch emulation by fabrics with limited memory. IEEE Journal on Selected Areas in Communications 21, 4 (2003), 606–615. Google Scholar
Digital Library
- [37] . 1996. TCP Selective Acknowledgment Options. Retrieved September 21, 2021 from https://tools.ietf.org/html/rfc2992#::text=Abstract%20Equal%2Dcost%20multi%2Dpath,method%20for%20making%20that%20decision.Google Scholar
- [38] . 1999. The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Transactions on Networking2 (1999), 188–201. Google Scholar
Digital Library
- [39] . 1997. Matching output queueing with combined input and output queueing. In Proceedings of the Annual Allerton Conference on Communication Control and Computing, Vol. 35. 595–603.Google Scholar
- [40] . 2020. QM8700 Mellanox Quantum™ HDR Edge Switch. Retrieved September 21, 2021 from https://www.mellanox.com/sites/default/files/doc-2020/pb-qm8700.pdf.Google Scholar
- [41] . 2020. SB7800 InfiniBand EDR 100Gb/s Switch System. Retrieved September 21, 2021 from https://www.mellanox.com/sites/default/files/doc-2020/pb-sb7800.pdf.Google Scholar
- [42] . 2019. Investigating the feasibility of FPGA-based network switches. In Proceedings of the 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures, and Processors (ASAP’19). IEEE, Los Alamitos, CA.Google Scholar
Cross Ref
- [43] . 2020. Fast and accurate training of ensemble models with FPGA-based switch. In Proceedings of the 2020 IEEE 31st International Conference on Application-Specific Systems, Architectures, and Processors (ASAP’20). IEEE, Los Alamitos, CA, 81–84.Google Scholar
Cross Ref
- [44] . 2003. Output queued switch emulation by a one-cell-internally buffered crossbar switch. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM’03), Vol. 7. IEEE, Los Alamitos, CA, 3688–3693.Google Scholar
Cross Ref
- [45] . 2012. Sorting networks on FPGAs. VLDB Journal—The International Journal on Very Large Data Bases 21, 1 (2012), 1–23. Google Scholar
Digital Library
- [46] . 2018. Low-cost sorting network circuits using unary processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (2018), 1471–1480.Google Scholar
Cross Ref
- [47] . 2019. QFX5100 Ethernet Switch. Retrieved September 21, 2021 from https://www.juniper.net/assets/us/en/local/pdf/datasheets/1000480-en.pdf.Google Scholar
- [48] . 2018. FLiMS: Fast lightweight merge sorter. In Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT’18). IEEE, Los Alamitos, CA, 78–85.Google Scholar
Cross Ref
- [49] . 2020. An adaptable high-throughput FPGA merge sorter for accelerating database analytics. In Proceedings of the 2020 30th International Conference on Field Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 65–72.Google Scholar
Cross Ref
- [50] . 2021. Extending the RISC-V ISA for exploring advanced reconfigurable SIMD instructions. In Proceedings of the 5th Workshop on Computer Architecture Research with RISC-V (CARRV’21), Held in Conjuction with ISCA 2021.Google Scholar
- [51] . 2018. Accelerating database systems using FPGAs: A survey. In Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, Los Alamitos, CA, 125–130.Google Scholar
Cross Ref
- [52] . 2020. High-performance FPGA network switch architecture. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). ACM, New York, NY, 76–85. https://doi.org/10.1145/3373087.3375299 Google Scholar
Digital Library
- [53] . 2019. Accelerating the merge phase of sort-merge join. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, Los Alamitos, CA, 100–105.Google Scholar
Cross Ref
- [54] . 2007. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In Proceedings of the 2007 Canadian Conference on Electrical and Computer Engineering. IEEE, Los Alamitos, CA, 1235–1238.Google Scholar
Cross Ref
- [55] . 2019. Limago: An FPGA-based open-source 100 GbE TCP/IP stack. In Proceedings of the 2019 30th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, Los Alamitos, CA.Google Scholar
Cross Ref
- [56] . 2020. Scaling distributed machine learning with in-network aggregation.
arxiv:cs.DC/1903.06701 Google Scholar - [57] . 2019. 10G/25G High Speed Ethernet Subsystem v3.0. Retrieved September 21, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/xxv_ethernet/v3_0/pg210-25g-ethernet.pdf.Google Scholar
- [58] . 2019. Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance. Retrieved September 21, 2021 from https://www.xilinx.com/support/documentation/white_papers/wp485-hbm.pdf.Google Scholar
- [59] 2015–2019. UltraScale+ FPGA Product Tables and Product Selection Guide. Xilinx Inc.Google Scholar
- [60] 2019. Virtex UltraScale+ FPGA Data Sheet: DC and AC Switching Characteristics (DS923). Xilinx Inc.Google Scholar
- [61] . 2010. Pipelined implementation of AES encryption based on FPGA. In Proceedings of the 2010 IEEE International Conference on Information Theory and Information Security. IEEE, Los Alamitos, CA, 170–173.Google Scholar
- [62] . 2002. A simple and fast parallel round-robin arbiter for high-speed switch control and scheduling. In Proceedings of the 2002 45th Midwest Symposium on Circuits and Systems (MWSCAS’02), Vol. 2. IEEE, Los Alamitos, CA, II.Google Scholar
Cross Ref
- [63] . 2013. Introduction to queueing theory and stochastic teletraffic models. arXiv:1307.2968Google Scholar
Index Terms
Hipernetch: High-Performance FPGA Network Switch
Recommendations
High-Performance FPGA Network Switch Architecture
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysWe present a high-throughput FPGA design for supporting high-performance network switching. FPGAs have recently been attracting attention for datacenter computing due to their increasing transceiver count and capabilities, which also benefit the ...
Design and implementation of a reconfigurable arbiter
SSIP'07: Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image ProcessingThe SOC design paradigm relies on well-defined interfaces and reuse of intellectual property (IP). Because more and more IPs are integrated into the design platform, the amount of communication between the IPs is on the increase and becomes the source ...
Scalable high-radix router microarchitecture using a network switch organization
As the system size of supercomputers and datacenters increases, cost-efficient networks become critical in achieving good scalability on those systems. High-radix routers reduce network cost by lowering the network diameter while providing a high ...






Comments