skip to main content
research-article

An efficient, low-cost routing framework for convex mesh partitions to support virtualization

Published:03 July 2013Publication History
Skip Abstract Section

Abstract

At the core of an efficient chip multiprocessors (CMP) is support for unicast and multicast routing, low implementation costs, and the ability to isolate concurrent applications with maximum utilization of the CMP. We present an efficient logic-based unicast and multicast routing algorithm that guarantees isolation of local application traffic within any near-convex region on the chip, and the algorithms to recognize supported partitions and configure the cores accordingly. Evaluations show that the routing algorithm has a 57% more compact implementation than a recent multicast solution with the same coverage, and it achieves 5% higher throughput with 13% lower latency.

References

  1. Abad, P., Puente, V., and Gregorio, J. A. 2009. MRR: Enabling fully adaptive multicast routing for CMP interconnection networks. In Proceedings of the 15th International Symposium on High-Performance Computer Architecture. 355--366.Google ScholarGoogle Scholar
  2. Al-Dubai, A. and Romdhani, I. 2006. A performance study of path based multicast communication algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering. 245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Azimi, M. 2007. Integration Challenges and Tradeoffs for Terascale Architectures. Intel Tech. J. 11, 03.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bilir, E. E., Dickson, R. M., Hu, Y., Plakal, M., Sorin, D. J., Hill, M. D., and Wood, D. A. 1999. Multicast snooping: A new coherence method using a multicast address network. ACM SIGARCH Comput. Architect. News 27, 2, 294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A. 2007. Routing table minimization for irregular mesh NoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). EDA Consortium, San Jose, CA, 942--947. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Borkar, S., Cohn, R., Cox, G., Gleason, S., and Gross, T. 1988. iWarp: an integrated solution of high-speed parallel computing. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE, 330--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chou, C.-L. and Marculescu, R. 2008. User-aware dynamic task allocation in networks-on-chip. In Proceedings of the Conference on Design, Automation and Test in Europe. 1232--1237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daneshtalab, M., Ebrahimi, M., Mohammadi, S., and Afzali-Kusha, A. 2009. Low-distance path-based multicast routing algorithm for network-on-chips. IET Comput. Digital Tech. 3, 5, 430.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ding, J. and Bhuyan, L. N. 1993. An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems. In Proceedings of the International Conference on Parallel Processing. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Enricht Jerger, N. D., Peh, L., and Lipasti, M. H. 2008a. Virtual circuit tree multicasting: A case for on-chip hardware multicast support. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Enricht Jerger, N. D., Peh, L., and Lipasti, M. H. 2008b. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fan Wu, C.-C. H. 2003. Processor allocation in the mesh multiprocessors using the leapfrog method. IEEE Trans. Parallel Distrib. Syst. 14, 3, 276--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Flich, J., Malumbres, M. P., López, P., and Duato, J. 2000. Performance evaluation of a new routing strategy for irregular networks with source routing. In Proceedings of the 14th International Conference on Supercomputing (ICS'00). ACM, New York, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Flich, J., Mejia, A., Lopez, P., and Duato, J. 2007. Region-based routing: An efficient routing mechanism to tackle unreliable hardware in network on chips. In Proceedings of the 1st ACM/IEEE International Symposium on Networks-on-Chip. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Flich, J., Rodrigo, S., and Duato, J. 2008. An efficient implementation of distributed routing algorithms for NoCs. Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip. 87--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gupta, V. and Jayendran, A. 1996. A flexible processor allocation strategy for mesh connected parallel systems. In Proceedings of the International Conference on Parallel Processing. 166--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Holsmark, R., Kumar, S., and Palesi, M. 2010. A multi-level routing scheme and router architecture to support hierarchical routing in large network on chip platforms. In Proceedings of the 4th Workshop on Highly Parallel Processing on a Chip (HPPC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Intel Corporation. 2011. The single-chip cloud computer.Google ScholarGoogle Scholar
  19. Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the CELL multiprocessor. IBM J. Res. Dev. 49, 4/5, 589--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Koibuchi, M., Jouraku, A., Amano, H., and Funahashi, A. 2001. L-turn routing: An adaptive routing in irregular networks. In Proceedings of the International Conference on Parallel Processing (ICPP'01). 383--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Koibuchi, M., Jouraku, A., Watanabe, K., and Amano, H. 2003. Descending layers routing: A deadlock-free deterministic routing using virtual channels in system area networks with irregular topologies. In Proceedings of the International Conference on Parallel Processing (ICPP'03). IEEE.Google ScholarGoogle Scholar
  22. Li, J., Xue, C. J., and Xu, Y. 2010. LADPM: Latency-aware dual-partition multicast routing for mesh-based network-on-chips. In Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS'10). IEEE, 423--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Loi, I., Angiolini, F., and Benini, L. 2009. Synthesis of low-overhead configurable source routing tables for network interfaces. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). 262--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lysne, O., Skeie, T., Reinemo, S.-A., and Theiss, I. R. T. 2006. Layered routing in irregular networks. IEEE Trans. Parallel. Distrib. Syst. 17, 51--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Malumbres, M., Duato, J., and Torrellas, J. 1996. An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors. In Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP'96). IEEE, 186--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin, M. M. K., Harper, P. J., Sorin, D. J., Hill, M. D., and Wood, D. A. 2003a. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In Proceedings of the International Symposium on Computer Architecture. 206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Martin, M. M. K., Hill, M. D., and Wood, D. A. 2003b. Token coherence:decoupling performance and correctness. ACM SIGARCH Comput. Architec. News 31, 2, 182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. McKinley, P., Xu, H., Esfahanian, A.-H., and Ni, L. 1994. Unicast-based multicast communication in wormhole-routed networks. IEEE Trans. Parallel Distrib. Syst. 5, 12, 1252--1265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mejía, A., Flich, J., Duato, J., Reinemo, S.-A., and Skeie, T. 2006. Segment-based routing: An efficient fault-tolerant routing algorithm for meshes and tori. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. S. A. P. Chair, Ed., IEEE, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Moonsoo Kang, C. Y. 2003. Isomorphic strategy for processor allocation in k-ary n-cube systems. IEEE Trans. Comput. 52, 5, 645--657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Palesi, M., Kumar, S., and Holsmark, R. 2006. A method for router table compression for application specific routing. In Proceedings of the SAMOS VI Workshop in Mesh Topology NoC Architectures. 373--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rodrigo, S. 2010. PhD thesis, Universidad Politècnica de Valencia.Google ScholarGoogle Scholar
  33. Rodrigo, S., Flich, J., Duato, J., and Hummel, M. 2008. Efficient unicast and multicast support for CMPs. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 364--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F., and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the 4th ACM/IEEE International Symposium on Networks-on-Chip. 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rodrigo, S., Hernandez, C., Flich, J., Silla, F., Duato, J., Medardoni, S., Bertozzi, D., Mejia, A., and Dai, D. 2009. Yield-oriented evaluation methodology of network-on-chip routing implementations. In Proceedings of the International Symposium on Parallel Processing. IEEE.Google ScholarGoogle Scholar
  36. Sancho, J., Robles, A., and Duato, J. 2000. A new methodology to compute deadlock-free routing tables for irregular networks. In Network-Based Parallel Computing. Communication, Architecture, and Applications. Springer, 45--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sancho, J., Robles, A., Flich, J., Lopez, P., and Duato, J. 2002. Effective methodology for deadlock-free minimal routing in InfiniBand networks. In Proceedings of the International Conference on Parallel Processing. 409--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Schroeder, M., Birrell, A., Burrows, M., Murray, H., Needham, R., Rodeheffer, T., Satterthwaite, E., and Thacker, C. 1991. Autonet: A high-speed, self-configuring local area network using point-to-point links. SRC res. rep. 8, Digital Equipment Corporation.Google ScholarGoogle Scholar
  39. Seo, D., Ali, A., Lim, W., Rafique, N., and Thottethodi, M. 2005. Near-optimal worst-case throughput routing for two-dimensional mesh networks. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE, 432--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Skeie, T., Lysne, O., and Theiss, I. 2002. Layered shortest path (LASH) Routing in irregular system area networks. In Proceedings of the International Parallel and Distributed Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Skeie, T., Sem-Jacobsen, F. O., Flich, J., Rodrigo, S., Bertozzi, D., and Simone, M. 2009. Flexible DOR routing for virtualization of multicore chips. In Proceedings of the International Symposium on System-on-Chip. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sun, G., Zhang, Y., Li, Y., Su, L., Jin, D., and Zeng, L. 2010. Convex-Based DOR Routing for Virtualization of NoC. Network Parallel Comput. 462--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. TriviÑO, F., SÁNchez, J. L., Alfaro, F. J., and Flich, J. 2010. Virtualizing network-on-chip resources in chip-multiprocessors. Microprocess. Microsyst. 1--16.Google ScholarGoogle Scholar
  44. Ubal, R., Sahuquillo, J., Petit, S., and López, P. 2007. Multi2Sim: A simulation framework to evaluate multicore-multithreaded processors. In Proceedings of the 19th Intenational Symposium on Computer Architecture and High Performance Computing.Google ScholarGoogle Scholar
  45. Varavithya, V. and Mohapatra, P. 1999. Asynchronous tree-based multicasting in wormhole-switched MINs. IEEE Trans. Parallel Distrib. Syst. 10, 11, 1159--1178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wang, L., Jin, Y., Kim, H., and Kim, E. J. 2009. Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip. 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wang, X., Yang, M., Jiang, Y., and Liu, P. 2011. On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks. Microprocess. Microsyst. 35, 2, 119--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization andmethodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95). ACM, New York, 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhu, Y. 1992. Efficient processor allocation strategies for mesh-connected parallel computers. J. Parallel. Distrib. Comput. 16, 328--337.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An efficient, low-cost routing framework for convex mesh partitions to support virtualization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!