skip to main content
research-article

HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and Regulators

Published:08 August 2022Publication History
Skip Abstract Section

Abstract

We can overcome the pessimism in worst-case routing latency analysis of timing-predictable Network-on-Chip (NoC) workloads by single-digit factors through the use of a hybrid field-programmable gate array (FPGA)–optimized NoC and workload-adapted regulation. Timing-predictable FPGA-optimized NoCs such as HopliteBuf integrate stall-free FIFOs that are sized using offline static analysis of a user-supplied flow pattern and rates. For certain bursty traffic and flow configurations, static analysis delivers very large, sometimes infeasible, FIFO size bounds and large worst-case latency bounds. Alternatively, backpressure-based NoCs such as HopliteBP can operate with lower latencies for certain bursty flows. However, they suffer from severe pessimism in the analysis due to the effect of pipelining of packets and interleaving of flows at switch ports. As we show in this article, a hybrid FPGA NoC that seamlessly composes both design styles on a per-switch basis delivers the best of both worlds, with improved feasibility (bounded operation) and tighter latency bounds. We select the NoC switch configuration through a novel evolutionary algorithm based on Maximum Likelihood Estimation (MLE). For synthetic (RANDOM, LOCAL) and real-world (SpMV, Graph) workloads, we demonstrate ≈2–3× improvements in feasibility and ≈1–6.8× in worst-case latency while requiring an LUT cost only ≈1–1.5× larger than the cheapest HopliteBuf solution. We also deploy and verify our NoC (PL) and MLE framework (PS) on a Pynq-Z1 to adapt and reconfigure NoC switches dynamically. We can further improve a workload’s routability by learning to surgically tune regulation rates for each traffic trace to maximize available routing bandwidth. We capture critical dependency between traces by modelling the regulation space as a multivariate Gaussian distribution and learn the distribution’s parameters using Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We also propose nested learning, which learns switch configurations and regulation rates in tandem. Compared with stand-alone switch learning, this symbiotic nested learning helps achieve ≈ 1.5× lower cost constrained latency, ≈ 3.1× faster individual rates, and ≈ 1.4× faster mean rates. We also evaluate improvements to vanilla NoCs’ routing using only stand-alone rate learning (no switch learning), with ≈ 1.6× lower latency across synthetic and real-world benchmarks.

REFERENCES

  1. [1] Boisvert Ronald F., Pozo Roldan, Remington Karin, Barrett Richard F., and Dongarra Jack J.. 1997. Matrix market: A web resource for test matrix collections. In Quality of Numerical Software. Springer, 125137.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Carlin Bradley P. and Louis Thomas A.. 2010. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC.Google ScholarGoogle Scholar
  3. [3] Costa Alberto and Nannicini Giacomo. 2018. RBFOpt: An open-source library for black-box optimization with costly function evaluations. Mathematical Programming Computation 10, 4 (01 Dec 2018), 597629. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Garg Tushar, Wasly Saud, Pellizzoni Rodolfo, and Kapre Nachiket. 2019. HopliteBuf: FPGA NoCs with provably stall-free FIFOs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 222231. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Garg Tushar, Wasly Saud, Pellizzoni Rodolfo, and Kapre Nachiket. 2020. HopliteBuf: Network calculus-based design of FPGA NoCs with provably stall-free FIFOs. ACM Transactions on Reconfigurable Technology and Systems 13, 2, Article 6 (Feb. 2020), 35 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Hansen Nikolaus. 2016. The CMA evolution strategy: A tutorial. arXiv:1604.00772.Google ScholarGoogle Scholar
  7. [7] Y. Huan and A. DeHon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In International Conference on Field-Programmable Technology. 47–52 DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Jeon S., Cho J., Jung Y., Park S., and Han T.. 2011. Automotive hardware development according to ISO 26262. In 13th International Conference on Advanced Communication Technology (ICACT’11). 588592.Google ScholarGoogle Scholar
  9. [9] Kapre N. and Gray J.. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Field Programmable Logic and Applications. 18. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Kapre Nachiket and Gray Jan. 2017. Hoplite: A deflection-routed directional torus NoC for FPGAs. ACM Transactions on Reconfigurable Technology and Systems 10, 2, Article 14 (March 2017), 24 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Kapre Nachiket, Ng Harnhua, Teo Kirvy, and Naude Jaco. 2015. InTime: A machine learning approach for efficient selection of FPGA CAD tool parameters. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). ACM, New York, NY, 2326. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Karol M., Hluchyj M., and Morgan S.. 1987. Input versus output queueing on a space-division packet switch. IEEE Transactions on Communications 35, 12 (1987), 13471356.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Kasapaki Evangelia, Schoeberl Martin, Sørensen Rasmus Bo, Müller Christoph, Goossens Kees, and Sparsø Jens. 2015. Argo: A real-time network-on-chip architecture with an efficient GALS implementation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 2 (2015), 479492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Kim Gwangsun, Lee Michael Mihn-Jong, Kim John, Lee Jae W., Abts Dennis, and Marty Michael. 2012. Low-overhead network-on-chip support for location-oblivious task placement. IEEE Transactions on Computers 63, 6 (2012), 14871500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Jean-Yves Le Boudec and Patrick Thiran. 2004. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Leskovec Jure and Krevl Andrej. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (June 2014). https://snap.stanford.edu/citing.html.Google ScholarGoogle Scholar
  17. [17] Malik Gurshaant, Lang Ian Elmor, Pellizoni Rodolfo, and Kapre Nachiket. 2020. Learn the switches: Evolving FPGA NoCs with stall-free and backpressure based routers. In 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, 1825.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Malik G. S. and Kapre N.. 2019. Enhancing butterfly fat tree NoCs for FPGAs with lightweight flow control. In IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). 154162.Google ScholarGoogle Scholar
  19. [19] Michael K. Papamichael and James C. Hoe. 2012. CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA’12). Association for Computing Machinery, New York, NY, USA, 37–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Parashar Angshuman, Rhu Minsoo, Mukkara Anurag, Puglielli Antonio, Venkatesan Rangharajan, Khailany Brucek, Emer Joel, Keckler Stephen W., and Dally William J.. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 2740.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Picornell Tomas, Flich José, Hernández Carles, and Duato Jose. 2020. Enforcing predictability of many-cores with DCFNoC. IEEE Transactions on Computers 70, 2 (2020), 270283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Psarras Anastasios, Lee Junghee, Seitanidis Ioannis, Nicopoulos Chrysostomos, and Dimitrakopoulos Giorgos. 2015. PhaseNoC: Versatile network traffic isolation through TDM-scheduled virtual channels. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 5 (2015), 844857.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Psarras Anastasios, Seitanidis I., Nicopoulos Chrysostomos, and Dimitrakopoulos Giorgos. 2015. PhaseNoC: TDM scheduling at the virtual-channel level for efficient network traffic isolation. In Design, Automation & Test in Europe Conference & Exhibition (DATE’15). IEEE, 10901095.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Sheikholeslami Ali, Yoshimura Ryuji, and Gulak P. Glenn. 1998. Look-up tables (LUTs) for multiple-valued, combinational logic. In Proceedings of the 28th IEEE International Symposium on Multiple-Valued Logic (Cat. No. 98CB36138). IEEE, 264269.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Swarbrick Ian, Gaitonde Dinesh, Ahmad Sagheer, Gaide Brian, and Arbel Ygal. 2019. Network-on-chip programmable platform in VersalTM ACAP architecture. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 212221. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Wasly Saud, Pellizzoni Rodolfo, and Kapre Nachiket. 2017. HopliteRT: An efficient FPGA NoC for real-time applications. In IEEE International Conference on Field-Programmable Technology (ICFPT’17). IEEE, 6471.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Wasly Saud, Pellizzoni Rodolfo, and Kapre Nachiket. 2017. Worst case latency analysis for Hoplite FPGA-based NoC. (2017). https://uwspace.uwaterloo.ca/handle/10012/12600.Google ScholarGoogle Scholar
  28. [28] Wassel Hassan M. G., Gao Ying, Oberg Jason K., Huffmire Ted, Kastner Ryan, Chong Frederic T., and Sherwood Timothy. 2013. Surfnoc: A low latency and provably non-interfering approach to secure networks-on-chip. ACM SIGARCH Computer Architecture News 41, 3 (2013), 583594.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and Regulators

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 4
          December 2022
          476 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3540252
          • Editor:
          • Deming Chen
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 August 2022
          • Online AM: 14 February 2022
          • Accepted: 1 December 2021
          • Revised: 1 October 2021
          • Received: 1 July 2021
          Published in trets Volume 15, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)133
          • Downloads (Last 6 weeks)9

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!