Abstract
A Network-on-Chip (NoC) is an essential component of a chip multiprocessor (CMP) which however contributes to a large fraction of system energy. The unpredictability of traffic across a NoC frequently involves an expensive over-sizing of NoC resources which in turn leads to a significant contribution to the CMP power consumption. There exists a body of work addressing this issue, however so far solutions fall short when aiming for power reduction whilst maintaining high NoC performance. This paper proposes to combine router architecture optimizations with smart resource management to overcome this limitation. Based on a fully segmented architecture, we present an online adaptive router adjusting its active routing resources to meet the current traffic demand. This enhanced power-gating strategy significantly decreases both static and dynamic power consumption of the NoC, up to 70% for synthetic traffic patterns and up to 58% for real traffic workloads, while preserving NoC latency and throughput. Thanks to these adaptive power-saving mechanisms the proposed segmented NoC router provides near energy-proportional operation across the range of used benchmarks.
- [1] [n. d.]. NetworkX–NetworkX Documentation. https://networkx.org/.Google Scholar
- [2] . 2007. Rotary router: An efficient architecture for CMP interconnection networks. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07) (San Diego, California, USA). Association for Computing Machinery, New York, NY, USA, 116–125. Google Scholar
Digital Library
- [3] . 2013. Scalable high-radix router microarchitecture using a network switch organization. ACM Trans. Archit. Code Optim. 10, 3, Article
17 (Sept. 2013), 25 pages. Google ScholarDigital Library
- [4] . 2009. McPAT 1.0: An integrated power, area, and timing modeling framework for multicore architecture. (
01 2009).Google Scholar - [5] . 2014. SynFull: Synthetic traffic models capturing cache coherent behaviour. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 109–120. Google Scholar
Cross Ref
- [6] . 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33–37. Google Scholar
Digital Library
- [7] . 2012. HNOCS: Modular open-source simulator for Heterogeneous NoCs. In 2012 International Conference on Embedded Computer Systems (SAMOS). 51–57. Google Scholar
Cross Ref
- [8] . 2011. Benchmarking Modern Multiprocessors. Ph. D. Dissertation. Princeton University.Google Scholar
Digital Library
- [9] . 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4 (2006), 52–60. Google Scholar
Digital Library
- [10] . 2012. NoRD: Node-router decoupling for effective power-gating of on-chip routers. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 270–281. Google Scholar
Digital Library
- [11] . 2015. Power punch: Towards non-blocking power-gating of NoC routers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 378–389. Google Scholar
Cross Ref
- [12] . 2020. DozzNoC: Reducing static and dynamic energy in NoCs with low-latency voltage regulators using machine learning. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1–11. Google Scholar
Cross Ref
- [13] . 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google Scholar
Digital Library
- [14] . 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In 33rd International Symposium on Computer Architecture (ISCA’06). 4–15. Google Scholar
Digital Library
- [15] . 2014. SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). 25–36. Google Scholar
Cross Ref
- [16] . 2013. Switch folding: Network-on-Chip routers with time-multiplexed output ports. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE). 344–349. Google Scholar
Cross Ref
- [17] . 1995. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Transactions on Parallel and Distributed Systems 6, 10 (1995), 1055–1067. Google Scholar
Digital Library
- [18] . 2017. Distributed and dynamic shared-buffer router for high-performance interconnect. In Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS’17) (Seoul, Republic of Korea). Association for Computing Machinery, New York, NY, USA, Article
2 , 8 pages. Google ScholarDigital Library
- [19] . 2017. Exploration of Multicore Systems Based on Silicon Integrated Communication Networks. Phd Thesis. Université Montpellier. https://tel.archives-ouvertes.fr/tel-01944111.Google Scholar
- [20] . 2018. SPONGE: A scalable pivot-based on/off gating engine for reducing static power in NoC routers. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’18) (Seattle, WA, USA). Association for Computing Machinery, New York, NY, USA, Article
17 , 6 pages. Google ScholarDigital Library
- [21] . 2019. UBERNoC: Unified buffer power-efficient router for network-on-chip. In Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip(NOCS’19) (New York, New York). Association for Computing Machinery, New York, NY, USA, Article
1 , 8 pages. Google ScholarDigital Library
- [22] . 2016. TooT: An efficient and scalable power-gating method for NoC routers. In 2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). 1–8. Google Scholar
Cross Ref
- [23] . 2019. Dynamic voltage and frequency scaling in NoCs with supervised and reinforcement learning techniques. IEEE Trans. Comput. 68, 3 (2019), 375–389. Google Scholar
Digital Library
- [24] . 2010. Realistic workload characterization and analysis for networks-on-chip design. In The 4th Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI).Google Scholar
- [25] . 2007. Avoiding message-dependent deadlock in network-based systems on chip. VLSI Design 2007 (
04 2007). Google ScholarCross Ref
- [26] . 2013. Centralized buffer router: A low latency, low power router for high radix NOCs. In 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS’13). 1–8. Google Scholar
Cross Ref
- [27] . 2015. Improving DVFS in NoCs with coherence prediction. In Proceedings of the 9th International Symposium on Networks-on-Chip (NOCS’15) (Vancouver, BC, Canada). Association for Computing Machinery, New York, NY, USA, Article
24 , 8 pages. Google ScholarDigital Library
- [28] . 2010. Netrace: Dependency-driven trace-based network-on-chip simulation. In Proceedings of the Third International Workshop on Network on Chip Architectures (NoCArc’10) (Atlanta, Georgia, USA). Association for Computing Machinery, New York, NY, USA, 31–36. Google Scholar
Digital Library
- [29] . 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro 27, 5 (2007), 51–61. Google Scholar
Cross Ref
- [30] . 2015. ORION3.0: A comprehensive NoC router estimation tool. IEEE Embedded Systems Letters 7, 2 (2015), 41–45. Google Scholar
Digital Library
- [31] . 2011. FlexiBuffer: Reducing leakage power in on-chip network routers. In 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC). 936–941.Google Scholar
Digital Library
- [32] . 2009. Low-cost router microarchitecture for on-chip networks. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 255–266. Google Scholar
Digital Library
- [33] . 2007. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In 2007 25th International Conference on Computer Design. 63–70. Google Scholar
Cross Ref
- [34] . 2002. Power-efficient interconnection networks: Dynamic voltage scaling with links. IEEE Computer Architecture Letters 1, 1 (2002), 6–6. Google Scholar
Digital Library
- [35] . 2010. Ultra fine-grained run-time power gating of on-chip routers for CMPs. In 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip (NOCS’10). 61–68. Google Scholar
Digital Library
- [36] . 2003. A low area overhead packet-switched network on chip: Architecture and prototyping. In IFIP VLSI-SoC 2003, IFIP WG 10.5 International Conference on Very Large Scale Integration of System-on-Chip, Darmstadt, Germany, 1–3 December 2003, , , , , , and (Eds.). Technische Universität Darmstadt, Insitute of Microelectronic Systems, 318–323.Google Scholar
- [37] . 2006. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 333–346. Google Scholar
Digital Library
- [38] . 2019. SMART++: Reducing cost and improving efficiency of multi-hop bypass in NoC routers. In Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip (NOCS’19) (New York, New York). Association for Computing Machinery, New York, NY, USA, Article
5 , 8 pages. Google ScholarDigital Library
- [39] . 2021. S-SMART++: A low-latency NoC leveraging speculative bypass requests. IEEE Trans. Comput. 70, 6 (2021), 819–832. Google Scholar
Cross Ref
- [40] . 2010. Design of a high-throughput distributed shared-buffer NoC router. In 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip (NOCS’10). 69–78. Google Scholar
Digital Library
- [41] . 2013. Energy-efficient interconnect via router parking. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 508–519. Google Scholar
Digital Library
- [42] . 1992. Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. Comput. 41, 6 (
June 1992), 725–737. Google ScholarDigital Library
- [43] . 2011. RoShaQ: High-performance on-chip router with shared queues. In 2011 IEEE 29th International Conference on Computer Design (ICCD). 232–238. Google Scholar
Digital Library
- [44] . 2008. An overview of the OMNeT++ simulation environment. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems Amp; Workshops (Marseille, France) (
Simutools’08 ). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL, Article60 , 10 pages.Google ScholarCross Ref
- [45] . 2016. DVFS for NoCs in CMPs: A thread voting approach. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 309–320. Google Scholar
Cross Ref
- [46] . 2016. Efficient synthetic traffic models for large, complex SoCs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 297–308. Google Scholar
Cross Ref
- [47] . 2019. On trade-off between static and dynamic power consumption in NoC power gating. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. Google Scholar
Cross Ref
- [48] . 2017. BlackOut: Enabling fine-grained power gating of buffers in Network-on-Chip routers. J. Parallel and Distrib. Comput. 104 (2017), 130–145. Google Scholar
Digital Library
Index Terms
A Segmented Adaptive Router for Near Energy-Proportional Networks-on-Chip
Recommendations
Express Router Microarchitecture for Triplet-based Hierarchical Interconnection Network
HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and SystemsNetwork-on-Chip (NoC) Router has an important impact on the network communication performance. High performance router will help to build a high-throughput, power-efficient and low-latency NoC. However, the existing baseline router of Triplet-based ...
A study of a wire-wireless hybrid NoC architecture with an energy-proportional multicast scheme for energy efficiency
The efficiency of interconnect network-on-chip (NoC) design significantly affects the thermal and energy-consumption problems. The wireless interconnect NoC (WiNoC) design provides a promising NoC architecture for multicast in chip multiprocessor (CMP) ...
A Latency-Efficient Router Architecture for CMP Systems
DSD '10: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and ToolsAs technology advances, the number of cores in Chip Multi Processor systems (CMPs) and Multi Processor Systems-on-Chips (MPSoCs) keeps increasing. Current test chips and products reach tens of cores, and it is expected to reach hundreds of cores in the ...






Comments