Abstract
HopliteBuf is a deflection-free, low-cost, and high-speed FPGA overlay Network-on-chip (NoC) with stall-free buffers. It is an FPGA-friendly 2D unidirectional torus topology built on top of HopliteRT overlay NoC. The stall-free buffers in HopliteBuf are supported by static analysis tools based on network calculus that help determine worst-case FIFO occupancy bounds for a prescribed workload. We implement these FIFOs using cheap LUT SRAMs (Xilinx SRL32s and Intel MLABs) to reduce cost. HopliteBuf is a hybrid microarchitecture that combines the performance benefits of conventional buffered NoCs by using stall-free buffers with the cost advantages of deflection-routed NoCs by retaining the lightweight unidirectional torus topology structure. We present two design variants of the HopliteBuf NoC: (1) single corner-turn FIFO (W → S) and (2) dual corner-turn FIFO (W → S+N). The single corner-turn (W → S) design is simpler and only introduces a buffering requirement for packets changing dimension from the X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W → S) as well as uphill (W → N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of a small increase in resource cost. Our static analysis delivers bounds that are not only better (in latency) than HopliteRT but also tighter by 2−3×. Across 100 randomly generated flowsets mapped to a 5×5 system size, HopliteBuf is able to route a larger fraction of these flowsets with <128-deep FIFOs, boost worst-case routing latency by ≈ 2× for mutually feasible flowsets, and support a 10% higher injection rate than HopliteRT. At 20% injection rates, HopliteRT is only able to route 1--2% of the flowsets, while HopliteBuf can deliver 40--50% sustainability. When compared to the W → Sbkp backpressure-based router, we observe that our HopliteBuf solution offers 25--30% better feasibility at 30--40% lower LUT cost.
- Altera Corp. 2015. Arria 10 Core Fabric and General Purpose I/Os Handbook. Retrieved from https://www.altera.com/en_US/pdfs/literature/hb/arria-10/a10_handbook.pdf.Google Scholar
- Ahmed Amari and Ahlem Mifdaoui. 2017. Worst-case timing analysis of ring networks with cyclic dependencies using network calculus. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’17). IEEE.Google Scholar
Cross Ref
- Ken Chapman. 2008. Saving costs with the SRL16E. White Paper WP271 (v1. 0), Xilinx Inc (2008).Google Scholar
- Jan Gray. 2016. GRVI-Phalanx: A massively parallel RISC-V FPGA accelerator accelerator. In Proceedings of the 24th IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 17--20.Google Scholar
Cross Ref
- Yutian Huan and A. DeHon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In Field-Programmable Technology. 47--52.Google Scholar
- S. Jeon, J. Cho, Y. Jung, S. Park, and T. Han. 2011. Automotive hardware development according to ISO 26262. In 13th International Conference on Advanced Communication Technology (ICACT’11). 588--592.Google Scholar
- N. Kapre and J. Gray. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Field Programmable Logic and Applications. 1--8. DOI:https://doi.org/10.1109/FPL.2015.7293956Google Scholar
- H. Kashif and H. Patel. 2014. Bounding buffer space requirements for real-time priority-aware networks. In 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC’14). 113--118.Google Scholar
- Hany Kashif and Hiren Patel. 2016. Buffer space allocation for real-time priority-aware networks. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’16). IEEE, 1--12.Google Scholar
Cross Ref
- John Kim. 2009. Low-cost router microarchitecture for on-chip networks. In 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42’09), David H. Albonesi, Margaret Martonosi, David I. August, and José F. Martínez (Eds.). ACM, 255--266. DOI:https://doi.org/10.1145/1669112.1669145Google Scholar
Digital Library
- Jean-Yves Le Boudec and Patrick Thiran. 2001. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer-Verlag.Google Scholar
Digital Library
- Michael K. Papamichael and James C. Hoe. 2012. CONNECT: Re-examining conventional wisdom for designing NoCs in the context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 37--46.Google Scholar
- Ian Swarbrick, Dinesh Gaitonde, Sagheer Ahmad, Brian Gaide, and Ygal Arbel. 2019. Network-on-chip programmable platform in VersalTM ACAP architecture. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 212--221. DOI:https://doi.org/10.1145/3289602.3293908Google Scholar
Digital Library
- Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2017. HopliteRT: An efficient FPGA NoC for real-time applications. In 2017 International Conference on Field Programmable Technology (ICFPT’17). IEEE, 64--71.Google Scholar
Cross Ref
- Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2017. Worst Case Latency Analysis for Hoplite FPGA-based NoC. Retrieved from http://hdl.handle.net/10012/12600.Google Scholar
- Xilinx Inc. 2015. 7 Series FPGAs Configurable Logic Block User Guide. Retrieved from http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf.Google Scholar
Index Terms
HopliteBuf: Network Calculus-Based Design of FPGA NoCs with Provably Stall-Free FIFOs
Recommendations
HopliteBuf: FPGA NoCs with Provably Stall-Free FIFOs
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysDeflection-routed NoCs like Hoplite and HopliteRT take advantage of FPGA-specific features to deliver low-cost, high-frequency, FPGA-friendly communication networks. However, they suffer from long packet deflection penalties, low sustained throughputs, ...
FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysThe latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline ...
Approximating fluid schedules in crossbar packet-switches and Banyan networks
We consider a problem motivated by the desire to provide flexible, rate-based, quality of service guarantees for packets sent over input queued switches and switch networks. Our focus is solving a type of online traffic scheduling problem, whose input ...






Comments