skip to main content
research-article

A Segmented Adaptive Router for Near Energy-Proportional Networks-on-Chip

Published:23 August 2022Publication History
Skip Abstract Section

Abstract

A Network-on-Chip (NoC) is an essential component of a chip multiprocessor (CMP) which however contributes to a large fraction of system energy. The unpredictability of traffic across a NoC frequently involves an expensive over-sizing of NoC resources which in turn leads to a significant contribution to the CMP power consumption. There exists a body of work addressing this issue, however so far solutions fall short when aiming for power reduction whilst maintaining high NoC performance. This paper proposes to combine router architecture optimizations with smart resource management to overcome this limitation. Based on a fully segmented architecture, we present an online adaptive router adjusting its active routing resources to meet the current traffic demand. This enhanced power-gating strategy significantly decreases both static and dynamic power consumption of the NoC, up to 70% for synthetic traffic patterns and up to 58% for real traffic workloads, while preserving NoC latency and throughput. Thanks to these adaptive power-saving mechanisms the proposed segmented NoC router provides near energy-proportional operation across the range of used benchmarks.

REFERENCES

  1. [1] [n. d.]. NetworkX–NetworkX Documentation. https://networkx.org/.Google ScholarGoogle Scholar
  2. [2] Abad Pablo, Puente Valentin, Gregorio José Angel, and Prieto Pablo. 2007. Rotary router: An efficient architecture for CMP interconnection networks. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07) (San Diego, California, USA). Association for Computing Machinery, New York, NY, USA, 116125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ahn Jung Ho, Son Young Hoon, and Kim John. 2013. Scalable high-radix router microarchitecture using a network switch organization. ACM Trans. Archit. Code Optim. 10, 3, Article 17 (Sept. 2013), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Architectures Multicore, Li Sheng, Ahn Jung Ho, Brockman Jay, and Jouppi Norman. 2009. McPAT 1.0: An integrated power, area, and timing modeling framework for multicore architecture. (01 2009).Google ScholarGoogle Scholar
  5. [5] Badr Mario and Jerger Natalie Enright. 2014. SynFull: Synthetic traffic models capturing cache coherent behaviour. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 109120. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Barroso L. A. and Hölzle U.. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 3337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Ben-Itzhak Y., Zahavi E., Cidon I., and Kolodny A.. 2012. HNOCS: Modular open-source simulator for Heterogeneous NoCs. In 2012 International Conference on Embedded Computer Systems (SAMOS). 5157. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bienia Christian. 2011. Benchmarking Modern Multiprocessors. Ph. D. Dissertation. Princeton University.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Binkert N. L., Dreslinski R. G., Hsu L. R., Lim K. T., Saidi A. G., and Reinhardt S. K.. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4 (2006), 5260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Chen L. and Pinkston T. M.. 2012. NoRD: Node-router decoupling for effective power-gating of on-chip routers. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 270281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chen L., Zhu D., Pedram M., and Pinkston T. M.. 2015. Power punch: Towards non-blocking power-gating of NoC routers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 378389. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Clark M., Chen Y., Karanth A., Ma B., and Louri A.. 2020. DozzNoC: Reducing static and dynamic energy in NoCs with low-latency voltage regulators using machine learning. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 111. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Dally William and Towles Brian. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Das C. R., Yousif M. S., Narayanan V., Park D., Nicopoulos C., and Kim J.. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In 33rd International Symposium on Computer Architecture (ISCA’06). 415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Daya B. K., Chen C. O., Subramanian S., Kwon W., Park S., Krishna T., Holt J., Chandrakasan A. P., and Peh L.. 2014. SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). 2536. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Dimitrakopoulos G., Georgiadis N., Nicopoulos C., and Kalligeros E.. 2013. Switch folding: Network-on-Chip routers with time-multiplexed output ports. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE). 344349. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Duato J.. 1995. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Transactions on Parallel and Distributed Systems 6, 10 (1995), 10551067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Effiong Charles, Sassatelli Gilles, and Gamatie Abdoulaye. 2017. Distributed and dynamic shared-buffer router for high-performance interconnect. In Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS’17) (Seoul, Republic of Korea). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Effiong Charles Emmanuel. 2017. Exploration of Multicore Systems Based on Silicon Integrated Communication Networks. Phd Thesis. Université Montpellier. https://tel.archives-ouvertes.fr/tel-01944111.Google ScholarGoogle Scholar
  20. [20] Farrokhbakht Hossein, Kamali Hadi Mardani, Jerger Natalie Enright, and Hessabi Shaahin. 2018. SPONGE: A scalable pivot-based on/off gating engine for reducing static power in NoC routers. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’18) (Seattle, WA, USA). Association for Computing Machinery, New York, NY, USA, Article 17, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Farrokhbakht Hossein, Kao Henry, and Jerger Natalie Enright. 2019. UBERNoC: Unified buffer power-efficient router for network-on-chip. In Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip(NOCS’19) (New York, New York). Association for Computing Machinery, New York, NY, USA, Article 1, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Farrokhbakht H., Taram M., Khaleghi B., and Hessabi S.. 2016. TooT: An efficient and scalable power-gating method for NoC routers. In 2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). 18. Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Fettes Q., Clark M., Bunescu R., Karanth A., and Louri A.. 2019. Dynamic voltage and frequency scaling in NoCs with supervised and reinforcement learning techniques. IEEE Trans. Comput. 68, 3 (2019), 375389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Gratz Paul V. and Keckler Stephen W.. 2010. Realistic workload characterization and analysis for networks-on-chip design. In The 4th Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI).Google ScholarGoogle Scholar
  25. [25] Hansson Andreas, Kees Goossens, and Andrei Rădulescu. 2007. Avoiding message-dependent deadlock in network-based systems on chip. VLSI Design 2007 (04 2007). Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Hassan S. M. and Yalamanchili S.. 2013. Centralized buffer router: A low latency, low power router for high radix NOCs. In 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS’13). 18. Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Hesse Robert and Jerger Natalie Enright. 2015. Improving DVFS in NoCs with coherence prediction. In Proceedings of the 9th International Symposium on Networks-on-Chip (NOCS’15) (Vancouver, BC, Canada). Association for Computing Machinery, New York, NY, USA, Article 24, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Hestness Joel, Grot Boris, and Keckler Stephen W.. 2010. Netrace: Dependency-driven trace-based network-on-chip simulation. In Proceedings of the Third International Workshop on Network on Chip Architectures (NoCArc’10) (Atlanta, Georgia, USA). Association for Computing Machinery, New York, NY, USA, 3136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Hoskote Y., Vangal S., Singh A., Borkar N., and Borkar S.. 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro 27, 5 (2007), 5161. Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Kahng A. B., Lin B., and Nath S.. 2015. ORION3.0: A comprehensive NoC router estimation tool. IEEE Embedded Systems Letters 7, 2 (2015), 4145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Kim G., Kim J., and Yoo S.. 2011. FlexiBuffer: Reducing leakage power in on-chip network routers. In 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC). 936941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Kim J.. 2009. Low-cost router microarchitecture for on-chip networks. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 255266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Kumary A., Kunduz P., Singhx A. P., Pehy L., and Jhay N. K.. 2007. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In 2007 25th International Conference on Computer Design. 6370. Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Shang Li, Peh L., and Jha N. K.. 2002. Power-efficient interconnection networks: Dynamic voltage scaling with links. IEEE Computer Architecture Letters 1, 1 (2002), 66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Matsutani H., Koibuchi M., Ikebuchi D., Usami K., Nakamura H., and Amano H.. 2010. Ultra fine-grained run-time power gating of on-chip routers for CMPs. In 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip (NOCS’10). 6168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Moraes Fernando Gehm, Mello Aline, Möller Leandro, Ost Luciano, and Calazans Ney Laert Vilar. 2003. A low area overhead packet-switched network on chip: Architecture and prototyping. In IFIP VLSI-SoC 2003, IFIP WG 10.5 International Conference on Very Large Scale Integration of System-on-Chip, Darmstadt, Germany, 1–3 December 2003, Glesner Manfred, Reis Ricardo Augusto da Luz, Eveking Hans, III Vincent John Mooney, Indrusiak Leandro Soares, and Zipf Peter (Eds.). Technische Universität Darmstadt, Insitute of Microelectronic Systems, 318323.Google ScholarGoogle Scholar
  37. [37] Nicopoulos C. A., Park D., Kim J., Vijaykrishnan N., Yousif M. S., and Das C. R.. 2006. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 333346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Pérez Iván, Vallejo Enrique, and Beivide Ramón. 2019. SMART++: Reducing cost and improving efficiency of multi-hop bypass in NoC routers. In Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip (NOCS’19) (New York, New York). Association for Computing Machinery, New York, NY, USA, Article 5, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Pérez Iván, Vallejo Enrique, and Beivide Ramón. 2021. S-SMART++: A low-latency NoC leveraging speculative bypass requests. IEEE Trans. Comput. 70, 6 (2021), 819832. Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Ramanujam R. S., Soteriou V., Lin B., and Peh L.. 2010. Design of a high-throughput distributed shared-buffer NoC router. In 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip (NOCS’10). 6978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Samih A., Wang R., Krishna A., Maciocco C., Tai C., and Solihin Y.. 2013. Energy-efficient interconnect via router parking. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 508519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Tamir Yuval and Frazier Gregory L.. 1992. Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. Comput. 41, 6 (June 1992), 725737. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Tran A. T. and Baas B. M.. 2011. RoShaQ: High-performance on-chip router with shared queues. In 2011 IEEE 29th International Conference on Computer Design (ICCD). 232238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Varga András and Hornig Rudolf. 2008. An overview of the OMNeT++ simulation environment. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems Amp; Workshops (Marseille, France) (Simutools’08). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL, Article 60, 10 pages.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Yao Yuan and Lu Zhonghai. 2016. DVFS for NoCs in CMPs: A thread voting approach. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 309320. Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Yin Jieming, Kayiran Onur, Poremba Matthew, Jerger Natalie Enright, and Loh Gabriel H.. 2016. Efficient synthetic traffic models for large, complex SoCs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 297308. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Zhu Di, Li Yunfan, and Chen Lizhong. 2019. On trade-off between static and dynamic power consumption in NoC power gating. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 16. Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zoni Davide, Canidio Andrea, Fornaciari William, Englezakis Panayiotis, Nicopoulos Chrysostomos, and Sazeides Yiannakis. 2017. BlackOut: Enabling fine-grained power gating of buffers in Network-on-Chip routers. J. Parallel and Distrib. Comput. 104 (2017), 130145. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Segmented Adaptive Router for Near Energy-Proportional Networks-on-Chip

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
            July 2022
            330 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/3551651
            • Editor:
            • Tulika Mitra
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 August 2022
            • Online AM: 12 April 2022
            • Accepted: 1 March 2022
            • Revised: 1 February 2022
            • Received: 1 November 2021
            Published in tecs Volume 21, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!