skip to main content
research-article

Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs

Published:22 March 2017Publication History
Skip Abstract Section

Abstract

We can design an FPGA-optimized lightweight network-on-chip (NoC) router for flit-oriented packet-switched communication that is an order of magnitude smaller (in terms of LUTs and FFs) than state-of-the-art FPGA overlay routers available today. We present Hoplite, an efficient, lightweight, and fast FPGA overlay NoC that is designed to be small and compact by (1) using deflection routing instead of buffered switching to eliminate expensive FIFO buffers and (2) using a torus topology to reduce the cost of switch crossbar. Buffering and crossbar implementation complexities have traditionally limited speeds and imposed heavy resource costs in conventional FPGA overlay NoCs. We take care to exploit the fracturable lookup tables (LUT) organization of the FPGA to further improve the resource efficiency of mapping the expensive crossbar multiplexers. Hoplite can outperform classic, bidirectional, buffered mesh networks for single-flit-oriented FPGA applications by as much as 1.5 × (best achievable throughputs for a 10 × 10 system) or 2.5 × (allocating same amount of FPGA resources to both NoCs) for uniform random traffic. When compared to buffered mesh switches, FPGA-based deflection routers are ≈ 3.5 × smaller (HLS-generated switch) and 2.5 × faster (clock period) for 32b payloads. In a separate experiment, we hand-crafted an RTL version of our switch with location constraints that requires only 60 LUTs and 100 FFs per router and runs at 2.9ns. We conduct additional layout experiments on modern Xilinx and Altera FPGAs and demonstrate wide-channel chip-spanning layouts that run in excess of 300MHz while consuming 10--15% of overall chip resources. We also demonstrate a clustered RISC-V multiprocessor organization that uses Hoplite to help deliver the high processing throughputs of the FPGA architecture to user applications.

References

  1. P. Abad, P. Prieto, L. G. Menezo, A. Colaso, V. Puente, and J A Gregorio. 2012. TOPAZ: An open-source interconnection network simulator for chip multiprocessors and supercomputers. In Proceedings of the 2012 6th IEEE/ACM International Symposium on Networks on Chip (NoCS). 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. S. Abdelfattah and V. Betz. 2012. Design tradeoffs for hard and soft FPGA-based networks-on-chip. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT). 95--103.Google ScholarGoogle Scholar
  3. Altera. 2011. Applying the Benefits of Network on a Chip Architecture to FPGA System Design. Altera White Paper. (Apr. 2011). Retrieved from https://www.altera.com/en_US/pdfs/literature/wp/wp-01149-noc-qsys.pdf.Google ScholarGoogle Scholar
  4. Altera Corp. 2015. Arria 10 Core Fabric and General Purpose I/Os Handbook. Retrieved May 2015 from https://www.altera.com/en_US/pdfs/literature/hb/arria-10/a10_handbook.pdf.Google ScholarGoogle Scholar
  5. Krste Asanović and David Patterson. 2014. Instruction sets should be free: the case for RISC-V. Technical Report No. UCB/EECS-2014-146. (Aug. 2014).Google ScholarGoogle Scholar
  6. Buchholz. 1992. Comments on CSMA. IEEE 802, 11 (1992), 802--11.Google ScholarGoogle Scholar
  7. Y. Cai, K. Mai, and O. Mutlu. 2015. Comparative evaluation of FPGA and ASIC implementations of bufferless and buffered routing algorithms for on-chip networks. In Proceedings of the 16th International Symposium on Quality Electronic Design. 475--484.Google ScholarGoogle Scholar
  8. W. J. Dally and B. Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the Design Automation Conference, 2001. 684--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. 2012. MinBD: Minimally-buffered deflection routing for energy-efficient interconnect. In 2012 Sixth IEEE/ACM International Symposium on Networks on Chip (NoCS). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Gray. 2014. Keynote 3 2014; The past and future of FPGA soft processors. In Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig). 1--1.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Gray. 2016. GRVI Phalanx: A massively parallel RISC-V FPGA accelerator accelerator. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 17--20.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yutian Huan and A. DeHon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT). 47--52.Google ScholarGoogle Scholar
  13. Mike Hutton. 2015. Understanding How the New HyperFlex Architecture Enables Next-Generation High-Performance Systems. Altera White Paper. Retrieved April 2015 from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01231-understanding-how-hyperflex-architecture-enables-high-performance-systems.pdf.Google ScholarGoogle Scholar
  14. N. Kapre and J. Gray. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Proceedings of the 2015 25th International Conference on Field Programmable Logic and Applications (FPL). 1--8.Google ScholarGoogle Scholar
  15. Nachiket Kapre, Nikil Mehta, Michael deLorimier, Raphael Rubin, Henry Barnor, Michael J. Wilson, Michael Wrighton, and Andre DeHon. 2006. Packet switched vs. time multiplexed FPGA overlay networks. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. John Kim. 2009. Low-cost router microarchitecture for on-chip networks. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 255--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. S. Landman and Roy L. Russo. 1971. On a pin versus block relationship for partitions of logic graphs. IEEE Transactions on Computers 12 (1971), 1469--1479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Michelogiannakis, D. Sanchez, W. J. Dally, and C. Kozyrakis. 2010. Evaluating bufferless flow control for on-chip networks. In Proceedings of the 2010 4th ACM/IEEE International Symposium on Networks-on-Chip (NOCS) (2010), 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas Moscibroda, Onur Mutlu, Thomas Moscibroda, and Onur Mutlu. 2009. A Case for Bufferless Routing in On-chip Networks. Vol. 37. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael K. Papamichael and James C. Hoe. 2012. CONNECT: Re-examining conventional wisdom for designing nocs in the context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium. ACM Press, New York, NY, 37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xilinx Inc. 2015. 7 Series FPGAs Configurable Logic Block User Guide. Retrieved February 2015 from http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB .pdf.Google ScholarGoogle Scholar
  22. Xilinx Inc. 2016a. 7 Series FPGAs Configurable Logic Block User Guide UG474. Technical Report. Xilinx Inc.Google ScholarGoogle Scholar
  23. Xilinx Inc. 2016b. UltraScale Architecture Configurable Logic Block User Guide UG574. Technical Report. Xilinx Inc.Google ScholarGoogle Scholar

Index Terms

  1. Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 2
          Special Section on Field Programmable Logic and Applications 2015 and Regular Papers
          June 2017
          133 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3068424
          • Editor:
          • Steve Wilton
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 March 2017
          • Accepted: 1 December 2016
          • Revised: 1 November 2016
          • Received: 1 April 2016
          Published in trets Volume 10, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!