skip to main content
research-article

A network congestion-aware memory subsystem for manycore

Published:03 July 2013Publication History
Skip Abstract Section

Abstract

The network-on-chip (NoC) plays a crucial role in memory performance due to the fact that it can handle the majority of traffics from/to the DRAM memory controllers. However, there has been little work on the interplay between the NoC and memory controllers. In this article, we address a problem called network congestion-induced memory blocking and propose a novel memory controller, which performs memory access scheduling and network entry control in a network congestion-aware manner. In case of network congestion, in order to avoid performance degradation due to the blocking caused by data bound for congested regions in the NoC, the proposed memory controller favors requests and data associated with uncongested regions. In addition, in order to avoid the fairness problem of such a policy, we also propose a gradual method, which enables a trade-off between performance (in memory utilization) and fairness (in memory access latency). Experimental results show that the proposed method can offer up to 1.76 ∼ 2.99 times improvement in memory utilization in the latency-tolerant designs.

References

  1. Abts, D., Jerger, N. D. E., and Kim, J. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahn, J., Erez, M., and Dally, W. 2006. The design space of data-parallel memory systems. In Proceedings of the Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Akesson, B., Goossens, K., and Ringhofer, M. 2007. Predator: A predictable SDRAM memory controller. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arm Ltd. AMBA3 (AXI) protocol. http://www.arm.com/products/solutions/AMBAHome Page.html.Google ScholarGoogle Scholar
  5. Borkar, S. 2007. Thousand-core chips: A technology perspective. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dally, W. J. and Aoki, H. 1993. Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels. IEEE Trans. Parallel Distrib. Syst. 4, 4, 466--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daneshtalab, M., Ebrahimi, M., Liljeberg, P., Plosila, J., and Tenhunen, H. 2010. A low-latency and memory-efficient on-chip network. In Proceedings of the International Symposium on Networks-on-Chip. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dumas, S. 2011. LPDDR3 and WideIO. In Proceedings of the JEDEC Mobile Memory Forum.Google ScholarGoogle Scholar
  9. Gratz, P., Grot, B., and Keckler, S. W. 2008. Regional congestion awareness for load balance in networks-on-chip. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  10. Heithecker, S. and Ernst, R. 2005. Traffic shaping for an FPGA based SDRAM controller with complex QoS requirements. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hur, I. and Lin, C. 2004. Adaptive history-based memory schedulers. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel Co. Single-chip cloud computer. http://techresearch.intel.com/articles/Tera-Scale/1826.htm.Google ScholarGoogle Scholar
  13. Ipek, E., Mutlu, O., Martinez, J. F., and Caruana, R. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jang, W. and Pan, D. Z. 2009. An SDRAM-aware router for networks-on-chip. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jang, W. and Pan, D. Z. 2010. Application-aware NoC design for efficient SDRAM access. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kim, D., Yoo, S., and Lee, S. 2010a. A network congestion-aware memory controller. In Proceedings of the International Symposium on Networks-on-Chip. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. 2005. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kim, J-S., Oh, C., et al. 2011. 1.2V 12.8GB/s 2Gb Mobile wide-I/O DRAM with 4x 128 I/Os using TSV-based stacking. In Proceedings of the IEEE International Solid-State Circuits Conference.Google ScholarGoogle Scholar
  19. Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010b. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  20. Kumar, A., Peh, L., Kundu, P., and Jha, N. K. 2007. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kwon, W., Hong, S., Yoo, S., Min, B., Choi, K., and Eo, S. 2008a. An Open-loop flow control scheme based on the accurate global information of on-chip communication. In Proceedings of the Conference on Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kwon, W., Yoo, S., Hong, S., Min, B., Choi, K., and Eo, S. 2008b. A practical approach of memory access parallelization to exploit multiple off-chip DDR memories. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kwon, W. and Yoo, S. 2009. In-network reorder buffer to improve NoC performance while resolving the in-order requirement problem. In Proceedings of the Conference on Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, C., Mutlu, O., Narasiman, V., and Patt, Y. N. 2008. Prefetch-aware DRAM controllers. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lee, H. 2010. 3D stacked memory design. In Proceding of the 3D IC Workshop.Google ScholarGoogle Scholar
  26. Lindholm, E., Nickolls, J., Oberman, S., and Montrym, J. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Loh, G. H. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lx2, Tensilica, Ltd. 2010. Xtensa customizable processors. http://www.tensilica.com/products/xtensa-customizable.htm.Google ScholarGoogle Scholar
  29. Mishra, A. K., Dong, X., Sun, G., Xie, Y., Vijaykrishnan, N., and Das, C. R. 2011. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In Proceedings of the International symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessor. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware memory access scheduling. In Proceedings of the International Symposium on Computer Architecture.Google ScholarGoogle Scholar
  32. Peh, L. and Dally, W. J. 2001. A delay model and speculative architecture for pipelined routers. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rau, B. R. 1991. Pseudo-randomly interleaved memory. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Singh, A., Dally, W. J., Gupta, A. K., and Towles, B. 2003. GOAL: A load-balanced adaptive routing algorithm for torus networks. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Singh, A., Dally, W. J., Towles, B., and Gupta, A. K. 2004. Globally adaptive load-balanced routing on Tori. IEEE Comput. Archit. Lett. 3, 2, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sohi, G. S. 1993. High-bandwidth interleaved memories for vector processors: A simulation study. In IEEE Trans. Comput. 42, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Woo, D., Seong, N., Lewis, D. L., and Lee, H. 2010. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In Proceedings of the International Symposium on High Performance Computer Architecture.Google ScholarGoogle Scholar
  39. Zhang, Z., Zhu, Z., and Zhang, X. 2000. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhu, Z. and Zhang, Z. 2005. A performance comparison of DRAM memory system optimizations for SMT processors. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A network congestion-aware memory subsystem for manycore

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!