Abstract
We can overcome the pessimism in worst-case routing latency analysis of timing-predictable Network-on-Chip (NoC) workloads by single-digit factors through the use of a hybrid field-programmable gate array (FPGA)–optimized NoC and workload-adapted regulation. Timing-predictable FPGA-optimized NoCs such as HopliteBuf integrate stall-free FIFOs that are sized using offline static analysis of a user-supplied flow pattern and rates. For certain bursty traffic and flow configurations, static analysis delivers very large, sometimes infeasible, FIFO size bounds and large worst-case latency bounds. Alternatively, backpressure-based NoCs such as HopliteBP can operate with lower latencies for certain bursty flows. However, they suffer from severe pessimism in the analysis due to the effect of pipelining of packets and interleaving of flows at switch ports. As we show in this article, a hybrid FPGA NoC that seamlessly composes both design styles on a per-switch basis delivers the best of both worlds, with improved feasibility (bounded operation) and tighter latency bounds. We select the NoC switch configuration through a novel evolutionary algorithm based on Maximum Likelihood Estimation (MLE). For synthetic (
- [1] . 1997. Matrix market: A web resource for test matrix collections. In Quality of Numerical Software. Springer, 125–137.Google Scholar
Cross Ref
- [2] . 2010. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC.Google Scholar
- [3] . 2018. RBFOpt: An open-source library for black-box optimization with costly function evaluations. Mathematical Programming Computation 10, 4 (
01 Dec 2018), 597–629.DOI: Google ScholarCross Ref
- [4] . 2019. HopliteBuf: FPGA NoCs with provably stall-free FIFOs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 222–231.
DOI: Google ScholarDigital Library
- [5] . 2020. HopliteBuf: Network calculus-based design of FPGA NoCs with provably stall-free FIFOs. ACM Transactions on Reconfigurable Technology and Systems 13, 2,
Article 6 (Feb. 2020), 35 pages.DOI: Google ScholarDigital Library
- [6] . 2016. The CMA evolution strategy: A tutorial. arXiv:1604.00772.Google Scholar
- [7] Y. Huan and A. DeHon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In International Conference on Field-Programmable Technology. 47–52
DOI: Google ScholarCross Ref
- [8] . 2011. Automotive hardware development according to ISO 26262. In 13th International Conference on Advanced Communication Technology (ICACT’11). 588–592.Google Scholar
- [9] . 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Field Programmable Logic and Applications. 1–8.
DOI: Google ScholarCross Ref
- [10] . 2017. Hoplite: A deflection-routed directional torus NoC for FPGAs. ACM Transactions on Reconfigurable Technology and Systems 10, 2, Article
14 (March 2017), 24 pages.DOI: Google ScholarDigital Library
- [11] . 2015. InTime: A machine learning approach for efficient selection of FPGA CAD tool parameters. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). ACM, New York, NY, 23–26.
DOI: Google ScholarDigital Library
- [12] . 1987. Input versus output queueing on a space-division packet switch. IEEE Transactions on Communications 35, 12 (1987), 1347–1356.Google Scholar
Cross Ref
- [13] . 2015. Argo: A real-time network-on-chip architecture with an efficient GALS implementation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 2 (2015), 479–492.Google Scholar
Digital Library
- [14] . 2012. Low-overhead network-on-chip support for location-oblivious task placement. IEEE Transactions on Computers 63, 6 (2012), 1487–1500.Google Scholar
Digital Library
- [15] Jean-Yves Le Boudec and Patrick Thiran. 2004. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet.
DOI: Google ScholarCross Ref
- [16] . 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (
June 2014). https://snap.stanford.edu/citing.html.Google Scholar - [17] . 2020. Learn the switches: Evolving FPGA NoCs with stall-free and backpressure based routers. In 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, 18–25.Google Scholar
Cross Ref
- [18] . 2019. Enhancing butterfly fat tree NoCs for FPGAs with lightweight flow control. In IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). 154–162.Google Scholar
- [19] Michael K. Papamichael and James C. Hoe. 2012. CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA’12). Association for Computing Machinery, New York, NY, USA, 37–46. Google Scholar
Digital Library
- [20] . 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 27–40.Google Scholar
Digital Library
- [21] . 2020. Enforcing predictability of many-cores with DCFNoC. IEEE Transactions on Computers 70, 2 (2020), 270–283.Google Scholar
Digital Library
- [22] . 2015. PhaseNoC: Versatile network traffic isolation through TDM-scheduled virtual channels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 5 (2015), 844–857.Google Scholar
Digital Library
- [23] . 2015. PhaseNoC: TDM scheduling at the virtual-channel level for efficient network traffic isolation. In Design, Automation & Test in Europe Conference & Exhibition (DATE’15). IEEE, 1090–1095.Google Scholar
Cross Ref
- [24] . 1998. Look-up tables (LUTs) for multiple-valued, combinational logic. In Proceedings of the 28th IEEE International Symposium on Multiple-Valued Logic (Cat. No. 98CB36138). IEEE, 264–269.Google Scholar
Cross Ref
- [25] . 2019. Network-on-chip programmable platform in VersalTM ACAP architecture. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 212–221.
DOI: Google ScholarDigital Library
- [26] . 2017. HopliteRT: An efficient FPGA NoC for real-time applications. In IEEE International Conference on Field-Programmable Technology (ICFPT’17). IEEE, 64–71.Google Scholar
Cross Ref
- [27] . 2017. Worst case latency analysis for Hoplite FPGA-based NoC. (2017). https://uwspace.uwaterloo.ca/handle/10012/12600.Google Scholar
- [28] . 2013. Surfnoc: A low latency and provably non-interfering approach to secure networks-on-chip. ACM SIGARCH Computer Architecture News 41, 3 (2013), 583–594.Google Scholar
Digital Library
Index Terms
HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and Regulators
Recommendations
On the design of reconfigurable crossbar switch for adaptable on-chip topologies in programmable NoC routers
GLSVLSI '09: Proceedings of the 19th ACM Great Lakes symposium on VLSIResearch works have focused on high-performance on-chip interconnections with low cost and energy consumption for the next generation of many-core processors. In the same way, parallel applications will explore thread level parallelism and message-...
Extending the Effective Throughput of NoCs With Distributed Shared-Buffer Routers
Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the ...
A High-Throughput Distributed Shared-Buffer NoC Router
Microarchitectural configurations of buffers in routers have a significant impact on the overall performance of an on-chip network (NoC). This buffering can be at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or ...






Comments