Abstract
As the number of IPs in a multimedia Multi-Processor System-on-Chip (MPSoC) continues to increase, concurrent memory accesses from different IPs increasingly stress memory systems, which presents both opportunities and challenges for future MPSoC design. The impact of such requirements on the system-level design for MPSoC is twofold. First, contention among IPs prolongs memory access time, which exacerbates the persisting memory wall problem. Second, longer memory accesses lead to longer IP stall time, which results in unnecessary leakage waste. In this article, we propose two memory access-aware system-level design approaches for performance and leakage optimization. To alleviate the memory wall problem, we propose a Hierarchical Memory Scheduling (HMS) policy that schedules memory requests from the same IP and application consecutively to reduce interference among memory accesses from different IPs with a fairness guarantee. To reduce IP leakage waste due to long memory access, we propose a memory access-aware power-gating policy. A straightforward power-gating approach is to power gate an IP when it needs to fetch data from memory. However, due to the response time variation among memory accesses, aggressively power gating an IP whenever a memory request occurs may result in incorrect power-gating decisions. The proposed memory access-aware power-gating policy makes these decisions judiciously, based on the predicted memory latency of an individual IP and its energy breakeven time. The experimental results show that the proposed HMS memory scheduling policy improves system throughput by 42% compared to First-Come-First-Serve (FCFS) and by 21% compared to First-Ready First-Come-First-Serve (FR-FCFS) on an MPSoC for mobile phones. For the improvement of fairness, HMS improves fairness by 1.52× compared to FCFS and by 1.23× compared to FRFCFS. In the aspect of leakage optimization, our memory access-aware power-gating mechanism improves energy savings by 3.88× and reduces the performance penalty by 70% compared to conventional timeout-based power gating. We further demonstrate that our HMS memory scheduler can regulate memory access orders, thereby reducing memory response time variation. This leads to more accurate power-down decisions for both conventional timeout power gating and the proposed memory access- aware power gating.
- R. Ausavarungnirun, K. K-. W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu. 2012. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In Proc. ISCA. 416--427. Google Scholar
Digital Library
- A. Burchard, E. H. Nowacka, and A. Chauhan. 2005. A real-time streaming memory controller. In Proc. DATE. 20--25. Google Scholar
Digital Library
- B. Dilip, P. S. Prasad, and R. S. G. Bhavani. 2012. Leakage power reduction in CMOS circuits using leakage control transistor technique in nanoscale technology. Electronics Signals and Systems 2 (2012), 72--77.Google Scholar
- E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. 2011. Parallel application memory scheduling. In Proc. MICRO. 362--373. Google Scholar
Digital Library
- Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. 2004. Microarchitectural techniques for power gating of execution units. In Proc. ISLPED. 32--37. Google Scholar
Digital Library
- I. Hur and C. Lin. 2004. Adaptive history-based memory schedulers. In Proc. MICRO. 343--354. Google Scholar
Digital Library
- A. B. Kahng, S. Kang, T. Rosing, and R. Strong. 2012. TAP V token-based adaptive power gating. In Proc. ISLPED. 203--208. Google Scholar
Digital Library
- M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi. 2007. Low Power Methodology Manual: For System-on-Chip Design. Springer. Google Scholar
Digital Library
- W.-C. Kwon, S. Yoo, S.-M. Hong, B. Min, K.-M. Choi, and S.-K. Eo. 2008. A practical approach of memory access parallelization to exploit multiple off-chip DDR memories. In Proc. DAC. 447--452. Google Scholar
Digital Library
- K.-B. Lee, T.-C. Lin, and C.-W Jen. 2005. An efficient quality-aware memory controller for multimedia platform SoC. IEEE Transactions on Circuits Systems Video Technology 15 (May 2005), 620--633. Google Scholar
Digital Library
- A. Lungu, P. Bose, A. Buyuktosunoglu, and D. J. Sorin. 2009. Dynamic power gating with quality guarantees. In Proc. ISLPED. 377--382. Google Scholar
Digital Library
- S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, and D. A. B. Weikle. 2000. Dynamic access ordering for streamed computations. IEEE Transactions on Computing 49 (Nov. 2000), 1255--1271. Google Scholar
Digital Library
- Micron. 2007. 128Mb Low-Power DDR SDRAM Component: MT46H8M16LF. Retrieved from http://www.micron.com/-/media/documents/products/data.Google Scholar
- S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proc. MICRO. 374--385. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proc. MICRO. 146--160. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proc. ISCA. 63--74. Google Scholar
Digital Library
- K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. 2006. Fair queuing memory systems. In Proc. MICRO. 208--222. Google Scholar
Digital Library
- S.-I. Park, Y. Yi, and I.-C. Park. 2003. High performance memory mode control for HDTV decoders. IEEE Transactions on Consumer Electronics 49 (Nov. 2003), 1348--1353. Google Scholar
Digital Library
- M. Powell, Se-Hyun Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proc. ISLPED. 90--95. Google Scholar
Digital Library
- N. Rafique, W.-T. Lim, and M. Thottethodi. 2007. Effective management of dram bandwidth in multicore processors. In Proc. PACT. 245--258. Google Scholar
Digital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. 2000. Memory access scheduling. In Proc. ISCA. 128--138. Google Scholar
Digital Library
- J. Shao and B. T. Davis. 2007. A burst scheduling access reordering mechanism. In Proc. HPCA. 285--294. Google Scholar
Digital Library
- A. Snavely and D. M. Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proc. ASPLOS. 234--244. Google Scholar
Digital Library
- L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. 2013. MISE: Providing performance predictability and improving fairness in shared main memory systems. In Proc. HPCA. 639--650. Google Scholar
Digital Library
- Synopsys, Inc. 2014. Platform Architect. Retrieved from http://www.synopsys.com/.Google Scholar
- Koenraad De Vleeschauwer. 2009. MPEG-2 Decoder User’s Guide. Retrieved from http://www.kdvelectronics.eu/mpeg2fpga/mpeg2fpga.pdf.Google Scholar
- A. Youssef, M. Anis, and M. Elmasry. 2006. Dynamic standby prediction for leakage tolerant microprocessor functional units. In Proc. MICRO. 371--384. Google Scholar
Digital Library
Index Terms
System-Level Performance and Power Optimization for MPSoC: A Memory Access-Aware Approach
Recommendations
Refresh pausing in DRAM memory systems
Dynamic Random Access Memory (DRAM) cells rely on periodic refresh operations to maintain data integrity. As the capacity of DRAM memories has increased, so has the amount of time consumed in doing refresh. Refresh operations contend with read ...
Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue
To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
An Overview of Various Leakage Power Reduction Techniques in Deep Submicron Technologies
ICCUBEA '15: Proceedings of the 2015 International Conference on Computing Communication Control and AutomationThe market demand and efficient portable electronic equipment have pushed the industry to produce circuit designs operating at low voltage (LV) for low power (LP) consumption. Reducing the supply voltage reduces the dynamic power quadratic ally and ...






Comments