Abstract
Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency and scalability compared to the state-of-the-art. The key idea is to use two concepts from graph and number theory, degree-diameter graphs combined with non-prime finite fields, to enable the smallest number of ports for a given core count. SN is inspired by state-of-the-art off-chip topologies; it identifies and distills their advantages for NoC settings while solving several key issues that lead to significant overheads on-chip. SN provides NoC-specific layouts, which further enhance area/energy efficiency. We show how to augment SN with state-of-the-art router microarchitecture schemes such as Elastic Links, to make the network even more scalable and efficient. Our extensive experimental evaluations show that SN outperforms both traditional low-radix topologies (e.g., meshes and tori) and modern high-radix networks (e.g., various Flattened Butterflies) in area, latency, throughput, and static/dynamic power consumption for both synthetic and real workloads. SN provides a promising direction in scalable and energy-efficient NoC topologies.
- N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. Scaling Towards Kilo-Core Processors with Asymmetric High-Radix Topologies. HPCA, 2013. Google Scholar
Digital Library
- T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir. SP2 System Architecture. IBM Systems Journal, 1995. Google Scholar
Digital Library
- J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. ISCA, 2015. Google Scholar
Digital Library
- J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks. SC, 2009. Google Scholar
Digital Library
- J. H. Ahn, Y. H. Son, and J. Kim. Scalable High-Radix Router Microarchitecture Using a Network Switch Organization. ACM TACO, 2008. Google Scholar
Digital Library
- R. Alverson, D. Roweth, and L. Kaplan. The Gemini System Interconnect. HOTI, 2010. Google Scholar
Digital Library
- R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. Design and Evaluation of Hierarchical Rings with Deflection Routing. SBAC-PAD, 2014. Google Scholar
Digital Library
- R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. A Case for Hierarchical Rings with Deflection Routing. PARCO, 2016. Google Scholar
Digital Library
- J. Balfour and W. J. Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. ICS, 2006. Google Scholar
Digital Library
- M. Besta, S. M. Hassan, S. Yalamanchili, R. Ausavarungnirun, O. Mutlu, and T. Hoefler. Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy-Efficiency and Scalability. Technical report, 2017.Google Scholar
- M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-Diameter Network Topology. SC, 2014. Google Scholar
Digital Library
- Y. Cai, K. Mai, and O. Mutlu. Comparative Evaluation of FPGA and ASIC Implementations of Bufferless and Buffered Routing Algorithms for On-Chip Networks. ISQED, 2015.Google Scholar
Cross Ref
- A. Ceyhan, M. Jung, S. Panth, S. K. Lim, and A. Naeemi. Impact of Size Effects in Local Interconnects for Future Technology Nodes: A Study Based on Full-Chip Layouts. IITC/AMC, 2014.Google Scholar
Cross Ref
- K. K.-W. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: Heterogeneous Adaptive Throttling for On-Chip Networks. SBAC-PAD, 2012. Google Scholar
Digital Library
- C.-H. O. Chen, S. Park, T. Krishna, S. Subramanian, A. P. Chandrakasan, and L.-S. Peh. SMART: A Single-Cycle Reconfigurable NoC for SoC Applications. DATE, 2013. Google Scholar
Digital Library
- L. Chen and T. M. Pinkston. Worm-bubble flow control. HPCA, 2013. Google Scholar
Digital Library
- L. Chen, R. Wang, and T. Pinkston. Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control. IPDPS, 2011. Google Scholar
Digital Library
- C. Craik and O. Mutlu. Investigating the Viability of Bufferless NoCs in Modern Chip Multi-Processor Systems. Carnegie Mellon University Safari Technical Report, 2011.Google Scholar
- W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., 2003. Google Scholar
Digital Library
- R. Das, S. Eachempati, A. Mishra, V. Narayanan, and C. Das. Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs. HPCA, 2009.Google Scholar
Cross Ref
- R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application-Aware Prioritization Mechanisms for On-Chip Networks. MICRO, 2009. Google Scholar
Digital Library
- R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: Exploiting Packet Latency Slack in On-Chip Networks. In ISCA, 2010. Google Scholar
Digital Library
- J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart. LINPACK Users' Guide. SIAM, 1979.Google Scholar
- EZchip Semiconductor Ltd. EZchip Introduces TILE-Mx100 World's Highest Core-Count ARM Processor Optimized for High-Performance Networking Applications. http://www.tilera.com/News/PressRelease/?ezchip=97, 2015.Google Scholar
- C. Fallin, C. Craik, and O. Mutlu. CHIPPER: A Low-Complexity Bufferless Deflection Router. HPCA, 2011. Google Scholar
Digital Library
- C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect. NOCS, 2012. Google Scholar
Digital Library
- C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. Bufferless and Minimally-Buffered Deflection Routing. Routing Algorithms in Networks-on-Chip, 2014.Google Scholar
Cross Ref
- H. Fu, J. Liao, J. Yang, L. Wang, Z. Song, X. Huang, C. Yang, W. Xue, F. Liu, F. Qiao, et al. The Sunway TaihuLight Supercomputer: System and Applications. Science China Information Sciences, 2016.Google Scholar
Cross Ref
- B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Express Cube Topologies for On-Chip Interconnects. HPCA, 2009.Google Scholar
Cross Ref
- B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Kilo-NoC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees. ISCA, 2011. Google Scholar
Digital Library
- S. Hassan and S. Yalamanchili. Centralized Buffer Router: A Low Latency, Low Power Router for High Radix NoCs. NOCS, 2013.Google Scholar
- S. Hassan and S. Yalamanchili. Bubble Sharing: Area and Energy Efficient Adaptive Routers using Centralized Buffers. NOCS, 2014.Google Scholar
Cross Ref
- IBM ILOG. User's Manual for CPLEX, v12.1. International Business Machines Corporation, 2009.Google Scholar
- A. Jain, R. Parikh, and V. Bertacco. High-Radix On-Chip Networks with Low-Radix Routers. ICCAD, 2014. Google Scholar
Digital Library
- N. Jiang, G. Michelogiannakis, D. Becker, B. Towles, and W. J. Dally. Booksim 2.0 User's Guide. Standford University, 2010.Google Scholar
- Y.-H. Kao, M. Yang, N. S. Artan, and H. J. Chao. CNoC: High-Radix Clos Network-on-Chip. TCAD, 2011. Google Scholar
Digital Library
- J. Kim. Low-Cost Router Microarchitecture for On-Chip Networks. MICRO, 2009. Google Scholar
Digital Library
- J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks. ISCA, 2007. Google Scholar
Digital Library
- J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. ISCA, 2008. Google Scholar
Digital Library
- A. K. Kodi, A. Sarathy, and A. Louri. iDEAL: Inter-Router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures. ISCA, 2008. Google Scholar
Digital Library
- A. Kumar, L.-S. Peh, P. Kundu, and N. Jha. Toward Ideal On-Chip Communication Using Express Virtual Channels. IEEE Micro, 2008. Google Scholar
Digital Library
- C. E. Leiserson. Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing. IEEE TC, 1985. Google Scholar
Digital Library
- J. Liu and J. G. Delgado-Frias. A DAMQ Shared Buffer Scheme for Network-on-Chip. CSS, 2007. Google Scholar
Digital Library
- R. Manevich, L. Polishuk, I. Cidon, and A. Kolodny. Designing Single-Cycle Long Links in Hierarchical NoCs. Microprocessors and Microsystems, 2014. Google Scholar
Digital Library
- B. D. McKay, M. Miller, and J. vSirán. A Note on Large Graphs of Diameter Two and Given Maximum Degree. Journal of Combinatorial Theory, Series B, 1998. Google Scholar
Digital Library
- G. Michelogiannakis, J. Balfour, and W. Dally. Elastic-Buffer Flow Control for On-Chip Networks. HPCA, 2009.Google Scholar
Cross Ref
- T. Moscibroda and O. Mutlu. A Case for Bufferless Routing in On-Chip Networks. ISCA, 2009. Google Scholar
Digital Library
- C. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das. ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers. MICRO, 2006. Google Scholar
Digital Library
- G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need? In HotNets, 2010. Google Scholar
Digital Library
- G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects. SIGCOMM, 2012. Google Scholar
Digital Library
- A. Olofsson. Epiphany-V: A 1024 Processor 64-bit RISC System-on-Chip. arXiv preprint arXiv:1610.01832, 2016.Google Scholar
- Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary. Firefly: Illuminating Future Network-on-Chip with Nanophotonics. ISCA, 2009. Google Scholar
Digital Library
- L.-S. Peh and W. J. Dally. A Delay Model and Speculative Architecture for Pipelined Routers. HPCA, 2001. Google Scholar
Digital Library
- Pezy Computing. PEZY-SC2. http://pezy.jp.Google Scholar
- N. Pippenger and G. Lin. Fault-Tolerant Circuit-Switching Networks. SPAA, 1992. Google Scholar
Digital Library
- V. Puente, R. Beivide, J. Gregorio, J. Prellezo, J. Duato, and C. Izu. Adaptive Bubble Router: A Design to Improve Performance in Torus Networks. ICPP, 1999. Google Scholar
Digital Library
- R. Ramanujam, V. Soteriou, B. Lin, and L.-S. Peh. Design of a High-Throughput Distributed Shared-Buffer NoC Router. NOCS, 2010. Google Scholar
Digital Library
- P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE CAL, 2011. Google Scholar
Digital Library
- S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-Radix Clos Network. ISCA, 2006. Google Scholar
Digital Library
- I. Seitanidis, A. Psarras, G. Dimitrakopoulos, and C. Nicopoulos. ElastiStore: An Elastic Buffer Architecture for Network-on-Chip Routers. DATE, 2014. Google Scholar
Digital Library
- K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. Swizzle-Switch Networks for Many-Core Systems. Emerging and Selected Topics in Circuits and Systems, 2012.Google Scholar
Cross Ref
- A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.Google Scholar
- S. Skiena. Dijkstra's algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Addison-Wesley, 1990.Google Scholar
Digital Library
- A. Sodani. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi Processor. HCS, 2015.Google Scholar
- G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press Wellesley, MA, 1993.Google Scholar
- C. Sun, C. O. Chen, G. Kurian, L. Wei, J. E. Miller, A. Agarwal, L. Peh, and V. Stojanovic. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling. NOCS, 2012. Google Scholar
Digital Library
- Y. Tamir and G. Frazier. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE TC, 1992. Google Scholar
Digital Library
- A. T. Tran and B. M. Baas. RoShaQ: High-Performance On-Chip Router with Shared Queues. ICCD, 2011. Google Scholar
Digital Library
- A. N. Udipi, N. Muralimanohar, and R. Balasubramonian. Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks. HPCA, 2010.Google Scholar
- J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili. Manifold: A Parallel Simulation Framework for Multicore Systems. ISPASS, 2014.Google Scholar
Cross Ref
- R. Wang, L. Chen, and T. M. Pinkston. Bubble Coloring: Avoiding Routing- and Protocol-Induced Deadlocks with Minimal Virtual Channel Requirement. ICS, 2013. Google Scholar
Digital Library
- X. Xiang, S. Ghose, O. Mutlu, and N.-F. Tzeng. A Model for Application Slowdown Estimation in On-Chip Networks and Its Use for Improving System Fairness and Performance. ICCD, 2016.Google Scholar
Cross Ref
- X. Xiang, W. Shi, S. Ghose, L. Peng, O. Mutlu, and N.-F. Tzeng. Carpool: A Bufferless On-Chip Network Supporting Adaptive Multicast and Hotspot Alleviation. ICS, 2017. Google Scholar
Digital Library
- Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang. A Low-Radix and Low-Diameter 3D Interconnection Network Design. HPCA, 2009.Google Scholar
- H. Yang, J. Tripathi, N. E. Jerger, and D. Gibson. Dodec: Random-Link, Low-Radix On-Chip Networks. MICRO, 2014. Google Scholar
Digital Library
- X. Yuan. On Nonblocking Folded-Clos Networks in Computer Communication Environments. IPDPS, 2011. Google Scholar
Digital Library
Index Terms
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability
Recommendations
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsEmerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency ...
3D NOC for many-core processors
With an increasing number of processors forming many-core chip multiprocessors (CMP), there exists a need for easily scalable, high-performance and low-power intra-chip communication infrastructure for emerging systems. In CMPs with hundreds of ...
A study of a wire-wireless hybrid NoC architecture with an energy-proportional multicast scheme for energy efficiency
The efficiency of interconnect network-on-chip (NoC) design significantly affects the thermal and energy-consumption problems. The wireless interconnect NoC (WiNoC) design provides a promising NoC architecture for multicast in chip multiprocessor (CMP) ...







Comments