Abstract
The network-on-chip (NoC) plays a crucial role in memory performance due to the fact that it can handle the majority of traffics from/to the DRAM memory controllers. However, there has been little work on the interplay between the NoC and memory controllers. In this article, we address a problem called network congestion-induced memory blocking and propose a novel memory controller, which performs memory access scheduling and network entry control in a network congestion-aware manner. In case of network congestion, in order to avoid performance degradation due to the blocking caused by data bound for congested regions in the NoC, the proposed memory controller favors requests and data associated with uncongested regions. In addition, in order to avoid the fairness problem of such a policy, we also propose a gradual method, which enables a trade-off between performance (in memory utilization) and fairness (in memory access latency). Experimental results show that the proposed method can offer up to 1.76 ∼ 2.99 times improvement in memory utilization in the latency-tolerant designs.
- Abts, D., Jerger, N. D. E., and Kim, J. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Ahn, J., Erez, M., and Dally, W. 2006. The design space of data-parallel memory systems. In Proceedings of the Conference on Supercomputing. Google Scholar
Digital Library
- Akesson, B., Goossens, K., and Ringhofer, M. 2007. Predator: A predictable SDRAM memory controller. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. Google Scholar
Digital Library
- Arm Ltd. AMBA3 (AXI) protocol. http://www.arm.com/products/solutions/AMBAHome Page.html.Google Scholar
- Borkar, S. 2007. Thousand-core chips: A technology perspective. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Dally, W. J. and Aoki, H. 1993. Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels. IEEE Trans. Parallel Distrib. Syst. 4, 4, 466--475. Google Scholar
Digital Library
- Daneshtalab, M., Ebrahimi, M., Liljeberg, P., Plosila, J., and Tenhunen, H. 2010. A low-latency and memory-efficient on-chip network. In Proceedings of the International Symposium on Networks-on-Chip. Google Scholar
Digital Library
- Dumas, S. 2011. LPDDR3 and WideIO. In Proceedings of the JEDEC Mobile Memory Forum.Google Scholar
- Gratz, P., Grot, B., and Keckler, S. W. 2008. Regional congestion awareness for load balance in networks-on-chip. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- Heithecker, S. and Ernst, R. 2005. Traffic shaping for an FPGA based SDRAM controller with complex QoS requirements. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Hur, I. and Lin, C. 2004. Adaptive history-based memory schedulers. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- Intel Co. Single-chip cloud computer. http://techresearch.intel.com/articles/Tera-Scale/1826.htm.Google Scholar
- Ipek, E., Mutlu, O., Martinez, J. F., and Caruana, R. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Jang, W. and Pan, D. Z. 2009. An SDRAM-aware router for networks-on-chip. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Jang, W. and Pan, D. Z. 2010. Application-aware NoC design for efficient SDRAM access. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Kim, D., Yoo, S., and Lee, S. 2010a. A network congestion-aware memory controller. In Proceedings of the International Symposium on Networks-on-Chip. Google Scholar
Digital Library
- Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. 2005. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Kim, J-S., Oh, C., et al. 2011. 1.2V 12.8GB/s 2Gb Mobile wide-I/O DRAM with 4x 128 I/Os using TSV-based stacking. In Proceedings of the IEEE International Solid-State Circuits Conference.Google Scholar
- Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010b. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- Kumar, A., Peh, L., Kundu, P., and Jha, N. K. 2007. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Kwon, W., Hong, S., Yoo, S., Min, B., Choi, K., and Eo, S. 2008a. An Open-loop flow control scheme based on the accurate global information of on-chip communication. In Proceedings of the Conference on Design, Automation and Test in Europe. Google Scholar
Digital Library
- Kwon, W., Yoo, S., Hong, S., Min, B., Choi, K., and Eo, S. 2008b. A practical approach of memory access parallelization to exploit multiple off-chip DDR memories. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Kwon, W. and Yoo, S. 2009. In-network reorder buffer to improve NoC performance while resolving the in-order requirement problem. In Proceedings of the Conference on Design, Automation and Test in Europe. Google Scholar
Digital Library
- Lee, C., Mutlu, O., Narasiman, V., and Patt, Y. N. 2008. Prefetch-aware DRAM controllers. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- Lee, H. 2010. 3D stacked memory design. In Proceding of the 3D IC Workshop.Google Scholar
- Lindholm, E., Nickolls, J., Oberman, S., and Montrym, J. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2. Google Scholar
Digital Library
- Loh, G. H. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Lx2, Tensilica, Ltd. 2010. Xtensa customizable processors. http://www.tensilica.com/products/xtensa-customizable.htm.Google Scholar
- Mishra, A. K., Dong, X., Sun, G., Xie, Y., Vijaykrishnan, N., and Das, C. R. 2011. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In Proceedings of the International symposium on Computer Architecture. Google Scholar
Digital Library
- Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessor. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware memory access scheduling. In Proceedings of the International Symposium on Computer Architecture.Google Scholar
- Peh, L. and Dally, W. J. 2001. A delay model and speculative architecture for pipelined routers. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- Rau, B. R. 1991. Pseudo-randomly interleaved memory. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Singh, A., Dally, W. J., Gupta, A. K., and Towles, B. 2003. GOAL: A load-balanced adaptive routing algorithm for torus networks. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Singh, A., Dally, W. J., Towles, B., and Gupta, A. K. 2004. Globally adaptive load-balanced routing on Tori. IEEE Comput. Archit. Lett. 3, 2, 2. Google Scholar
Digital Library
- Sohi, G. S. 1993. High-bandwidth interleaved memories for vector processors: A simulation study. In IEEE Trans. Comput. 42, 1. Google Scholar
Digital Library
- Woo, D., Seong, N., Lewis, D. L., and Lee, H. 2010. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In Proceedings of the International Symposium on High Performance Computer Architecture.Google Scholar
- Zhang, Z., Zhu, Z., and Zhang, X. 2000. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proceedings of the International Symposium on Microarchitecture. Google Scholar
Digital Library
- Zhu, Z. and Zhang, Z. 2005. A performance comparison of DRAM memory system optimizations for SMT processors. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
Index Terms
A network congestion-aware memory subsystem for manycore
Recommendations
A Network Congestion-Aware Memory Controller
NOCS '10: Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-ChipNetwork-on-chip and memory controller become correlated with each other in case of high network congestion since the network port of memory controller can be blocked due to the (back-propagated) network congestion. We call such a problem network ...
A Memory Access Scheduling Method for Multi-core Processor
IWCSE '09: Proceedings of the 2009 Second International Workshop on Computer Science and Engineering - Volume 01It is well known fact that multi-core processor architecture is the mainstream of the next-generation microprocessor architecture and actualizes by Chip Multi-core Processors (CMP). As the number of cores per processor and the number of threaded ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation ConferenceHybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...






Comments