skip to main content
research-article

Hipernetch: High-Performance FPGA Network Switch

Published:30 November 2021Publication History
Skip Abstract Section

Abstract

We present Hipernetch, a novel FPGA-based design for performing high-bandwidth network switching. FPGAs have recently become more popular in data centers due to their promising capabilities for a wide range of applications. With the recent surge in transceiver bandwidth, they could further benefit the implementation and refinement of network switches used in data centers. Hipernetch replaces the crossbar with a “combined parallel round-robin arbiter”. Unlike a crossbar, the combined parallel round-robin arbiter is easy to pipeline, and does not require centralised iterative scheduling algorithms that try to fit too many steps in a single or a few FPGA cycles. The result is a network switch implementation on FPGAs operating at a high frequency and with a low port-to-port latency. Our proposed Hipernetch architecture additionally provides a competitive switching performance approaching output-queued crossbar switches. Our implemented Hipernetch designs exhibit a throughput that exceeds 100 Gbps per port for switches of up to 16 ports, reaching an aggregate throughput of around 1.7 Tbps.

REFERENCES

  1. [1] OMNeT++. 2020. OMNeT++ Discrete Event Simulator. Retrieved September 21, 2021 from https://omnetpp.org/.Google ScholarGoogle Scholar
  2. [2] Abdelfattah Mohamed S., Bitar Andrew, and Betz Vaughn. 2015. Take the highway: Design for embedded NoCs on FPGAs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 98107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ahmadi Hamid, Denzel Wolfgang E., Murphy Charles A., and Port Erich. 1989. A high-performance switch fabric for integrated circuit and packet switching. International Journal of Digital & Analog Cabled Systems 2, 4 (1989), 277287.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Al-Fares Mohammad, Loukissas Alexander, and Vahdat Amin. 2008. A scalable, commodity data center network architecture. SIGCOMM Computer Communication Review 38, 4 (Aug. 2008), 6374. https://doi.org/10.1145/1402946.1402967Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Alizadeh Mohammad, Yang Shuang, Sharif Milad, Katti Sachin, McKeown Nick, Prabhakar Balaji, and Shenker Scott. 2013. pFabric: Minimal near-optimal datacenter transport. In Proceedings of the ACM SIGCOMM Conference (SIGCOMM’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Anderson Thomas E., Owicki Susan S., Saxe James B., and Thacker Charles P.. 1993. High-speed switch scheduling for local-area networks. ACM Transactions on Computer Systems 11, 4 (1993), 319352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Inc Arista Networks,. 2019. 7800R3 Series Data Center Switch Router Data Sheet. Arista Networks, Inc.Google ScholarGoogle Scholar
  8. [8] Batcher Kenneth E.. 1968. Sorting networks and their applications. In Proceedings of the Spring Joint Computer Conference. ACM, New York, NY, 307314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Bitar Andrew, Cassidy Jeffrey, Jerger Natalie Enright, and Betz Vaughn. 2014. Efficient and programmable Ethernet switching with a NoC-enhanced FPGA. In Proceedings of the 10th ACM/IEEE Symposium on Architectures for Networking and Communications Systems. ACM, New York, NY, 89100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Bundala Daniel and Závodnỳ Jakub. 2014. Optimal sorting networks. In Proceedings of the International Conference on Language and Automata Theory and Applications. 236247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Caulfield Adrian M., Chung Eric S., Putnam Andrew, Angepat Hari, Fowers Jeremy, Haselman Michael, Heil Stephen, et al. 2016. A cloud-scale acceleration architecture. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, 113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Chao Jonathan. 2000. Saturn: A terabit packet switch using dual round robin. IEEE Communications Magazine 38, 12 (2000), 7884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Chuang Shang-Tse, Goel Ashish, McKeown Nick, and Prabhakar Balaji. 1999. Matching output queueing with a combined input/output-queued switch. IEEE Journal on Selected Areas in Communications 17, 6 (1999), 10301039. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Cisco. 2020. Compare Models. Retrieved September 21, 2021 from https://www.cisco.com/c/en/us/products/switches/nexus-3000-series-switches/models-comparison.html#tab-nexus3200.Google ScholarGoogle Scholar
  15. [15] Cisco. 2020. Cisco Nexus 3000 Series Switches: Compare Models. Retrieved September 21, 2021 from https://www.cisco.com/c/en/us/products/switches/nexus-3000-series-switches/models-comparison.html#tab-nexus3500.Google ScholarGoogle Scholar
  16. [16] Cisco. 2020. Cisco Nexus 3264Q Switch Data Sheet. Retrieved September 21, 2021 from https://www.cisco.com/c/en/us/products/collateral/switches/nexus-3264q-switch/datasheet-c78-734905.html.Google ScholarGoogle Scholar
  17. [17] Clos Charles. 1953. A study of non-blocking switching networks. Bell System Technical Journal 32, 2 (1953), 406424.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Dai Zefu and Zhu Jianwen. 2012. Saturating the transceiver bandwidth: Switch fabric design on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 6776. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Dao Nguyen, Attwood Andrew, Healy Bea, and Koch Dirk. 2020. FlexBex: A RISC-V with a reconfigurable instruction extension. In Proceedings of the 2020 International Conference on Field-Programmable Technology (ICFPT’20).Google ScholarGoogle Scholar
  20. [20] Fowers Jeremy, Kim Joo-Young, Burger Doug, and Hauck Scott. 2015. A scalable high-bandwidth architecture for lossless compression on FPGAs. In Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA, 5259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Gebara Nadeen, Meng Jiuxi, Luk Wayne, and Costa Paolo. 2018. Scheduling algorithms for high performance network switching on FPGAs: A survey. In Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT’18). IEEE, Los Alamitos, CA, 166173.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Gupta Pankaj and McKeown Nick. 1999. Designing and implementing a fast crossbar scheduler. IEEE Micro 19, 1 (1999), 2028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] He Chunzhi and Yeung Kwan L.. 2011. D-LQF: An efficient distributed scheduling algorithm for input-queued switches. In Proceedings of the 2011 IEEE International Conference on Communications (ICC’11). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hillis W. Daniel and Jr Guy L. Steele. 1986. Data parallel algorithms. Communications of the ACM 29, 12 (1986), 11701183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Hodjat Alireza and Verbauwhede Ingrid. 2004. A 21.54 Gbits/s fully pipelined AES processor on FPGA. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA, 308309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Hopps C.. 2000. Analysis of an Equal-Cost Multi-Path Algorithm. Retrieved September 21, 2021 from https://tools.ietf.org/html/rfc2992#::text=Abstract%20Equal%2Dcost%20multi%2Dpath,method%20for%20making%20that%20decision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Kanizo Yossi, Hay David, and Keslassy Isaac. 2009. The crosspoint-queued switch. In Proceedings of IEEE INFOCOM 2009. IEEE, Los Alamitos, CA, 729737.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kim John, Dally William J., Towles Brian, and Gupta Amit K.. 2005. Microarchitecture of a high-radix router. ACM SIGARCH Computer Architecture News 33 (2005), 420431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Koch Dirk, Dao Nguyen, Healy Bea, Yu Jing, and Attwood Andrew. 2021. FABulous: An embedded FPGA framework. In Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 4556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Kuon Ian and Rose Jonathan. 2007. Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems 26, 2 (2007), 203215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Kusuma Enas Dhuhri and Widodo Thomas Sri. 2010. FPGA implementation of pipelined 2D-DCT and quantization architecture for JPEG image compression. In Proceedings of the 2010 International Symposium on Information Technology, Vol. 1. IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Leiserson Charles E.. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100, 10 (1985), 892901. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Li Yihan, Panwar Shivendra, and Chao H. Jonathan. 2001. On the performance of a dual round-robin switch. In Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Society (INFOCOM’01), Vol. 3. IEEE, Los Alamitos, CA, 16881697.Google ScholarGoogle Scholar
  34. [34] Li Yihan, Panwar Shivendra, and Chao H. Jonathan. 2002. The dual round robin matching switch with exhaustive service. In Proceedings of the Workshop on High Performance Switching and Routing, Merging Optical and IP Technologie. IEEE, Los Alamitos, CA, 5863.Google ScholarGoogle Scholar
  35. [35] Ltd. FIBERSTORE2020. N8500-48B6C 48-Port 25Gb SFP28 L3 Trident 3 Data Centre Managed Ethernet Switch. Retrieved September 21, 2021 from https://www.fs.com/uk/products/75807.html.Google ScholarGoogle Scholar
  36. [36] Magill Robert B., Rohrs Charles E., and Stevenson Robert L.. 2003. Output-queued switch emulation by fabrics with limited memory. IEEE Journal on Selected Areas in Communications 21, 4 (2003), 606615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Mahdavi J., Floyd S., and Romanow A.. 1996. TCP Selective Acknowledgment Options. Retrieved September 21, 2021 from https://tools.ietf.org/html/rfc2992#::text=Abstract%20Equal%2Dcost%20multi%2Dpath,method%20for%20making%20that%20decision.Google ScholarGoogle Scholar
  38. [38] McKeown Nick. 1999. The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Transactions on Networking2 (1999), 188201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] McKeown Nick, Prabhakar Balaji, and Zhu Mingyan. 1997. Matching output queueing with combined input and output queueing. In Proceedings of the Annual Allerton Conference on Communication Control and Computing, Vol. 35. 595603.Google ScholarGoogle Scholar
  40. [40] Mellanox. 2020. QM8700 Mellanox Quantum™ HDR Edge Switch. Retrieved September 21, 2021 from https://www.mellanox.com/sites/default/files/doc-2020/pb-qm8700.pdf.Google ScholarGoogle Scholar
  41. [41] Mellanox. 2020. SB7800 InfiniBand EDR 100Gb/s Switch System. Retrieved September 21, 2021 from https://www.mellanox.com/sites/default/files/doc-2020/pb-sb7800.pdf.Google ScholarGoogle Scholar
  42. [42] Meng Jiuxi, Gebara Nadeen, Ng Ho-Cheung, Costa Paolo, and Luk Wayne. 2019. Investigating the feasibility of FPGA-based network switches. In Proceedings of the 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures, and Processors (ASAP’19). IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Meng Jiuxi, Guo Ce, Gebara Nadeen, and Luk Wayne. 2020. Fast and accurate training of ensemble models with FPGA-based switch. In Proceedings of the 2020 IEEE 31st International Conference on Application-Specific Systems, Architectures, and Processors (ASAP’20). IEEE, Los Alamitos, CA, 8184.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Mhamdi Lotfi and Hamdi Mounir. 2003. Output queued switch emulation by a one-cell-internally buffered crossbar switch. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM’03), Vol. 7. IEEE, Los Alamitos, CA, 36883693.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Mueller Rene, Teubner Jens, and Alonso Gustavo. 2012. Sorting networks on FPGAs. VLDB Journal—The International Journal on Very Large Data Bases 21, 1 (2012), 123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Najafi M. Hassan, Lilja David J., Riedel Marc D., and Bazargan Kia. 2018. Low-cost sorting network circuits using unary processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (2018), 14711480.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Networks Juniper. 2019. QFX5100 Ethernet Switch. Retrieved September 21, 2021 from https://www.juniper.net/assets/us/en/local/pdf/datasheets/1000480-en.pdf.Google ScholarGoogle Scholar
  48. [48] Papaphilippou Philippos, Brooks Chris, and Luk Wayne. 2018. FLiMS: Fast lightweight merge sorter. In Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT’18). IEEE, Los Alamitos, CA, 7885.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Papaphilippou Philippos, Brooks Chris, and Luk Wayne. 2020. An adaptable high-throughput FPGA merge sorter for accelerating database analytics. In Proceedings of the 2020 30th International Conference on Field Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 6572.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Papaphilippou Philippos, Kelly Paul H. J., and Luk Wayne. 2021. Extending the RISC-V ISA for exploring advanced reconfigurable SIMD instructions. In Proceedings of the 5th Workshop on Computer Architecture Research with RISC-V (CARRV’21), Held in Conjuction with ISCA 2021.Google ScholarGoogle Scholar
  51. [51] Papaphilippou Philippos and Luk Wayne. 2018. Accelerating database systems using FPGAs: A survey. In Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, Los Alamitos, CA, 125130.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Papaphilippou Philippos, Meng Jiuxi, and Luk Wayne. 2020. High-performance FPGA network switch architecture. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). ACM, New York, NY, 7685. https://doi.org/10.1145/3373087.3375299 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Papaphilippou Philippos, Pirk Holger, and Luk Wayne. 2019. Accelerating the merge phase of sort-merge join. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, Los Alamitos, CA, 100105.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Rigler Suzanne, Bishop William, and Kennings Andrew. 2007. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In Proceedings of the 2007 Canadian Conference on Electrical and Computer Engineering. IEEE, Los Alamitos, CA, 12351238.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Ruiz Mario, Sidler David, Sutter Gustavo, Alonso Gustavo, and Lopez-Buedo Sergio. 2019. Limago: An FPGA-based open-source 100 GbE TCP/IP stack. In Proceedings of the 2019 30th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Sapio Amedeo, Canini Marco, Ho Chen-Yu, Nelson Jacob, Kalnis Panos, Kim Changhoon, Krishnamurthy Arvind, Moshref Masoud, Ports Dan R. K., and Richtárik Peter. 2020. Scaling distributed machine learning with in-network aggregation. arxiv:cs.DC/1903.06701Google ScholarGoogle Scholar
  57. [57] Xilinx. 2019. 10G/25G High Speed Ethernet Subsystem v3.0. Retrieved September 21, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/xxv_ethernet/v3_0/pg210-25g-ethernet.pdf.Google ScholarGoogle Scholar
  58. [58] Xilinx. 2019. Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance. Retrieved September 21, 2021 from https://www.xilinx.com/support/documentation/white_papers/wp485-hbm.pdf.Google ScholarGoogle Scholar
  59. [59] Inc. Xilinx20152019. UltraScale+ FPGA Product Tables and Product Selection Guide. Xilinx Inc.Google ScholarGoogle Scholar
  60. [60] Inc. Xilinx2019. Virtex UltraScale+ FPGA Data Sheet: DC and AC Switching Characteristics (DS923). Xilinx Inc.Google ScholarGoogle Scholar
  61. [61] Zhang Yulin and Wang Xinggang. 2010. Pipelined implementation of AES encryption based on FPGA. In Proceedings of the 2010 IEEE International Conference on Information Theory and Information Security. IEEE, Los Alamitos, CA, 170173.Google ScholarGoogle Scholar
  62. [62] Zheng Si Qing, Yang Mei, Blanton John, Golla Prasad, and Verchere Dominique. 2002. A simple and fast parallel round-robin arbiter for high-speed switch control and scheduling. In Proceedings of the 2002 45th Midwest Symposium on Circuits and Systems (MWSCAS’02), Vol. 2. IEEE, Los Alamitos, CA, II.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zukerman Moshe. 2013. Introduction to queueing theory and stochastic teletraffic models. arXiv:1307.2968Google ScholarGoogle Scholar

Index Terms

  1. Hipernetch: High-Performance FPGA Network Switch

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Transactions on Reconfigurable Technology and Systems
                    ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 1
                    March 2022
                    262 pages
                    ISSN:1936-7406
                    EISSN:1936-7414
                    DOI:10.1145/3494949
                    • Editor:
                    • Deming Chen
                    Issue’s Table of Contents

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 30 November 2021
                    • Accepted: 1 July 2021
                    • Revised: 1 March 2021
                    • Received: 1 July 2020
                    Published in trets Volume 15, Issue 1

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article
                    • Refereed

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader

                  Full Text

                  View this article in Full Text.

                  View Full Text

                  HTML Format

                  View this article in HTML Format .

                  View HTML Format
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!