Abstract
At the core of an efficient chip multiprocessors (CMP) is support for unicast and multicast routing, low implementation costs, and the ability to isolate concurrent applications with maximum utilization of the CMP. We present an efficient logic-based unicast and multicast routing algorithm that guarantees isolation of local application traffic within any near-convex region on the chip, and the algorithms to recognize supported partitions and configure the cores accordingly. Evaluations show that the routing algorithm has a 57% more compact implementation than a recent multicast solution with the same coverage, and it achieves 5% higher throughput with 13% lower latency.
- Abad, P., Puente, V., and Gregorio, J. A. 2009. MRR: Enabling fully adaptive multicast routing for CMP interconnection networks. In Proceedings of the 15th International Symposium on High-Performance Computer Architecture. 355--366.Google Scholar
- Al-Dubai, A. and Romdhani, I. 2006. A performance study of path based multicast communication algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering. 245. Google Scholar
Digital Library
- Azimi, M. 2007. Integration Challenges and Tradeoffs for Terascale Architectures. Intel Tech. J. 11, 03.Google Scholar
Cross Ref
- Bilir, E. E., Dickson, R. M., Hu, Y., Plakal, M., Sorin, D. J., Hill, M. D., and Wood, D. A. 1999. Multicast snooping: A new coherence method using a multicast address network. ACM SIGARCH Comput. Architect. News 27, 2, 294. Google Scholar
Digital Library
- Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A. 2007. Routing table minimization for irregular mesh NoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). EDA Consortium, San Jose, CA, 942--947. Google Scholar
Digital Library
- Borkar, S., Cohn, R., Cox, G., Gleason, S., and Gross, T. 1988. iWarp: an integrated solution of high-speed parallel computing. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE, 330--339. Google Scholar
Digital Library
- Chou, C.-L. and Marculescu, R. 2008. User-aware dynamic task allocation in networks-on-chip. In Proceedings of the Conference on Design, Automation and Test in Europe. 1232--1237. Google Scholar
Digital Library
- Daneshtalab, M., Ebrahimi, M., Mohammadi, S., and Afzali-Kusha, A. 2009. Low-distance path-based multicast routing algorithm for network-on-chips. IET Comput. Digital Tech. 3, 5, 430.Google Scholar
Cross Ref
- Ding, J. and Bhuyan, L. N. 1993. An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems. In Proceedings of the International Conference on Parallel Processing. IEEE. Google Scholar
Digital Library
- Enricht Jerger, N. D., Peh, L., and Lipasti, M. H. 2008a. Virtual circuit tree multicasting: A case for on-chip hardware multicast support. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Enricht Jerger, N. D., Peh, L., and Lipasti, M. H. 2008b. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 35--46. Google Scholar
Digital Library
- Fan Wu, C.-C. H. 2003. Processor allocation in the mesh multiprocessors using the leapfrog method. IEEE Trans. Parallel Distrib. Syst. 14, 3, 276--289. Google Scholar
Digital Library
- Flich, J., Malumbres, M. P., López, P., and Duato, J. 2000. Performance evaluation of a new routing strategy for irregular networks with source routing. In Proceedings of the 14th International Conference on Supercomputing (ICS'00). ACM, New York, 34--43. Google Scholar
Digital Library
- Flich, J., Mejia, A., Lopez, P., and Duato, J. 2007. Region-based routing: An efficient routing mechanism to tackle unreliable hardware in network on chips. In Proceedings of the 1st ACM/IEEE International Symposium on Networks-on-Chip. IEEE. Google Scholar
Digital Library
- Flich, J., Rodrigo, S., and Duato, J. 2008. An efficient implementation of distributed routing algorithms for NoCs. Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip. 87--96. Google Scholar
Digital Library
- Gupta, V. and Jayendran, A. 1996. A flexible processor allocation strategy for mesh connected parallel systems. In Proceedings of the International Conference on Parallel Processing. 166--173. Google Scholar
Digital Library
- Holsmark, R., Kumar, S., and Palesi, M. 2010. A multi-level routing scheme and router architecture to support hierarchical routing in large network on chip platforms. In Proceedings of the 4th Workshop on Highly Parallel Processing on a Chip (HPPC'10). Google Scholar
Digital Library
- Intel Corporation. 2011. The single-chip cloud computer.Google Scholar
- Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the CELL multiprocessor. IBM J. Res. Dev. 49, 4/5, 589--604. Google Scholar
Digital Library
- Koibuchi, M., Jouraku, A., Amano, H., and Funahashi, A. 2001. L-turn routing: An adaptive routing in irregular networks. In Proceedings of the International Conference on Parallel Processing (ICPP'01). 383--392. Google Scholar
Digital Library
- Koibuchi, M., Jouraku, A., Watanabe, K., and Amano, H. 2003. Descending layers routing: A deadlock-free deterministic routing using virtual channels in system area networks with irregular topologies. In Proceedings of the International Conference on Parallel Processing (ICPP'03). IEEE.Google Scholar
- Li, J., Xue, C. J., and Xu, Y. 2010. LADPM: Latency-aware dual-partition multicast routing for mesh-based network-on-chips. In Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS'10). IEEE, 423--430. Google Scholar
Digital Library
- Loi, I., Angiolini, F., and Benini, L. 2009. Synthesis of low-overhead configurable source routing tables for network interfaces. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). 262--267. Google Scholar
Digital Library
- Lysne, O., Skeie, T., Reinemo, S.-A., and Theiss, I. R. T. 2006. Layered routing in irregular networks. IEEE Trans. Parallel. Distrib. Syst. 17, 51--65. Google Scholar
Digital Library
- Malumbres, M., Duato, J., and Torrellas, J. 1996. An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors. In Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP'96). IEEE, 186--189. Google Scholar
Digital Library
- Martin, M. M. K., Harper, P. J., Sorin, D. J., Hill, M. D., and Wood, D. A. 2003a. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In Proceedings of the International Symposium on Computer Architecture. 206. Google Scholar
Digital Library
- Martin, M. M. K., Hill, M. D., and Wood, D. A. 2003b. Token coherence:decoupling performance and correctness. ACM SIGARCH Comput. Architec. News 31, 2, 182. Google Scholar
Digital Library
- McKinley, P., Xu, H., Esfahanian, A.-H., and Ni, L. 1994. Unicast-based multicast communication in wormhole-routed networks. IEEE Trans. Parallel Distrib. Syst. 5, 12, 1252--1265. Google Scholar
Digital Library
- Mejía, A., Flich, J., Duato, J., Reinemo, S.-A., and Skeie, T. 2006. Segment-based routing: An efficient fault-tolerant routing algorithm for meshes and tori. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. S. A. P. Chair, Ed., IEEE, 1--10. Google Scholar
Digital Library
- Moonsoo Kang, C. Y. 2003. Isomorphic strategy for processor allocation in k-ary n-cube systems. IEEE Trans. Comput. 52, 5, 645--657. Google Scholar
Digital Library
- Palesi, M., Kumar, S., and Holsmark, R. 2006. A method for router table compression for application specific routing. In Proceedings of the SAMOS VI Workshop in Mesh Topology NoC Architectures. 373--384. Google Scholar
Digital Library
- Rodrigo, S. 2010. PhD thesis, Universidad Politècnica de Valencia.Google Scholar
- Rodrigo, S., Flich, J., Duato, J., and Hummel, M. 2008. Efficient unicast and multicast support for CMPs. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 364--375. Google Scholar
Digital Library
- Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F., and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the 4th ACM/IEEE International Symposium on Networks-on-Chip. 25--32. Google Scholar
Digital Library
- Rodrigo, S., Hernandez, C., Flich, J., Silla, F., Duato, J., Medardoni, S., Bertozzi, D., Mejia, A., and Dai, D. 2009. Yield-oriented evaluation methodology of network-on-chip routing implementations. In Proceedings of the International Symposium on Parallel Processing. IEEE.Google Scholar
- Sancho, J., Robles, A., and Duato, J. 2000. A new methodology to compute deadlock-free routing tables for irregular networks. In Network-Based Parallel Computing. Communication, Architecture, and Applications. Springer, 45--60. Google Scholar
Digital Library
- Sancho, J., Robles, A., Flich, J., Lopez, P., and Duato, J. 2002. Effective methodology for deadlock-free minimal routing in InfiniBand networks. In Proceedings of the International Conference on Parallel Processing. 409--418. Google Scholar
Digital Library
- Schroeder, M., Birrell, A., Burrows, M., Murray, H., Needham, R., Rodeheffer, T., Satterthwaite, E., and Thacker, C. 1991. Autonet: A high-speed, self-configuring local area network using point-to-point links. SRC res. rep. 8, Digital Equipment Corporation.Google Scholar
- Seo, D., Ali, A., Lim, W., Rafique, N., and Thottethodi, M. 2005. Near-optimal worst-case throughput routing for two-dimensional mesh networks. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE, 432--443. Google Scholar
Digital Library
- Skeie, T., Lysne, O., and Theiss, I. 2002. Layered shortest path (LASH) Routing in irregular system area networks. In Proceedings of the International Parallel and Distributed Processing Symposium. Google Scholar
Digital Library
- Skeie, T., Sem-Jacobsen, F. O., Flich, J., Rodrigo, S., Bertozzi, D., and Simone, M. 2009. Flexible DOR routing for virtualization of multicore chips. In Proceedings of the International Symposium on System-on-Chip. Google Scholar
Digital Library
- Sun, G., Zhang, Y., Li, Y., Su, L., Jin, D., and Zeng, L. 2010. Convex-Based DOR Routing for Virtualization of NoC. Network Parallel Comput. 462--469. Google Scholar
Digital Library
- TriviÑO, F., SÁNchez, J. L., Alfaro, F. J., and Flich, J. 2010. Virtualizing network-on-chip resources in chip-multiprocessors. Microprocess. Microsyst. 1--16.Google Scholar
- Ubal, R., Sahuquillo, J., Petit, S., and López, P. 2007. Multi2Sim: A simulation framework to evaluate multicore-multithreaded processors. In Proceedings of the 19th Intenational Symposium on Computer Architecture and High Performance Computing.Google Scholar
- Varavithya, V. and Mohapatra, P. 1999. Asynchronous tree-based multicasting in wormhole-switched MINs. IEEE Trans. Parallel Distrib. Syst. 10, 11, 1159--1178. Google Scholar
Digital Library
- Wang, L., Jin, Y., Kim, H., and Kim, E. J. 2009. Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip. 9. Google Scholar
Digital Library
- Wang, X., Yang, M., Jiang, Y., and Liu, P. 2011. On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks. Microprocess. Microsyst. 35, 2, 119--129. Google Scholar
Digital Library
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization andmethodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95). ACM, New York, 24--36. Google Scholar
Digital Library
- Zhu, Y. 1992. Efficient processor allocation strategies for mesh-connected parallel computers. J. Parallel. Distrib. Comput. 16, 328--337.Google Scholar
Cross Ref
Index Terms
An efficient, low-cost routing framework for convex mesh partitions to support virtualization
Recommendations
Enabling power efficiency through dynamic rerouting on-chip
Special Section on Wireless Health Systems, On-Chip and Off-Chip Network ArchitecturesNetworks-on-chip (NoCs) are key components in many-core chip designs. Dynamic power-awareness is a new challenge present in NoCs that must be efficiently handled by the routing functionality as it introduces irregularities in the commonly used 2-D ...
An adaptive partition-based multicast routing scheme for mesh-based Networks-on-Chip
An efficient algorithm for routing multicast traffic using recursive partition is proposed.A novel and easy method for minimizing the link usage of a multicast tree is introduced.This algorithm uses minimal adaptive routing to balance the multicast ...
Convex-based DOR routing for virtualization of NoC
NPC'10: Proceedings of the 2010 IFIP international conference on Network and parallel computingNetwork on Chip (NoC) is proposed as a promising intra-chip communication infrastructure. A simple and efficient routing scheme is important for large scale NoC to provide the required communication performance to applications with low area and power ...






Comments