skip to main content
research-article

System-Level Performance and Power Optimization for MPSoC: A Memory Access-Aware Approach

Authors Info & Claims
Published:21 January 2015Publication History
Skip Abstract Section

Abstract

As the number of IPs in a multimedia Multi-Processor System-on-Chip (MPSoC) continues to increase, concurrent memory accesses from different IPs increasingly stress memory systems, which presents both opportunities and challenges for future MPSoC design. The impact of such requirements on the system-level design for MPSoC is twofold. First, contention among IPs prolongs memory access time, which exacerbates the persisting memory wall problem. Second, longer memory accesses lead to longer IP stall time, which results in unnecessary leakage waste. In this article, we propose two memory access-aware system-level design approaches for performance and leakage optimization. To alleviate the memory wall problem, we propose a Hierarchical Memory Scheduling (HMS) policy that schedules memory requests from the same IP and application consecutively to reduce interference among memory accesses from different IPs with a fairness guarantee. To reduce IP leakage waste due to long memory access, we propose a memory access-aware power-gating policy. A straightforward power-gating approach is to power gate an IP when it needs to fetch data from memory. However, due to the response time variation among memory accesses, aggressively power gating an IP whenever a memory request occurs may result in incorrect power-gating decisions. The proposed memory access-aware power-gating policy makes these decisions judiciously, based on the predicted memory latency of an individual IP and its energy breakeven time. The experimental results show that the proposed HMS memory scheduling policy improves system throughput by 42% compared to First-Come-First-Serve (FCFS) and by 21% compared to First-Ready First-Come-First-Serve (FR-FCFS) on an MPSoC for mobile phones. For the improvement of fairness, HMS improves fairness by 1.52× compared to FCFS and by 1.23× compared to FRFCFS. In the aspect of leakage optimization, our memory access-aware power-gating mechanism improves energy savings by 3.88× and reduces the performance penalty by 70% compared to conventional timeout-based power gating. We further demonstrate that our HMS memory scheduler can regulate memory access orders, thereby reducing memory response time variation. This leads to more accurate power-down decisions for both conventional timeout power gating and the proposed memory access- aware power gating.

References

  1. R. Ausavarungnirun, K. K-. W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu. 2012. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In Proc. ISCA. 416--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Burchard, E. H. Nowacka, and A. Chauhan. 2005. A real-time streaming memory controller. In Proc. DATE. 20--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Dilip, P. S. Prasad, and R. S. G. Bhavani. 2012. Leakage power reduction in CMOS circuits using leakage control transistor technique in nanoscale technology. Electronics Signals and Systems 2 (2012), 72--77.Google ScholarGoogle Scholar
  4. E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. 2011. Parallel application memory scheduling. In Proc. MICRO. 362--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. 2004. Microarchitectural techniques for power gating of execution units. In Proc. ISLPED. 32--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Hur and C. Lin. 2004. Adaptive history-based memory schedulers. In Proc. MICRO. 343--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. B. Kahng, S. Kang, T. Rosing, and R. Strong. 2012. TAP V token-based adaptive power gating. In Proc. ISLPED. 203--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi. 2007. Low Power Methodology Manual: For System-on-Chip Design. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W.-C. Kwon, S. Yoo, S.-M. Hong, B. Min, K.-M. Choi, and S.-K. Eo. 2008. A practical approach of memory access parallelization to exploit multiple off-chip DDR memories. In Proc. DAC. 447--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K.-B. Lee, T.-C. Lin, and C.-W Jen. 2005. An efficient quality-aware memory controller for multimedia platform SoC. IEEE Transactions on Circuits Systems Video Technology 15 (May 2005), 620--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Lungu, P. Bose, A. Buyuktosunoglu, and D. J. Sorin. 2009. Dynamic power gating with quality guarantees. In Proc. ISLPED. 377--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, and D. A. B. Weikle. 2000. Dynamic access ordering for streamed computations. IEEE Transactions on Computing 49 (Nov. 2000), 1255--1271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Micron. 2007. 128Mb Low-Power DDR SDRAM Component: MT46H8M16LF. Retrieved from http://www.micron.com/-/media/documents/products/data.Google ScholarGoogle Scholar
  14. S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proc. MICRO. 374--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Mutlu and T. Moscibroda. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proc. MICRO. 146--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Mutlu and T. Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proc. ISCA. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. 2006. Fair queuing memory systems. In Proc. MICRO. 208--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S.-I. Park, Y. Yi, and I.-C. Park. 2003. High performance memory mode control for HDTV decoders. IEEE Transactions on Consumer Electronics 49 (Nov. 2003), 1348--1353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Powell, Se-Hyun Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proc. ISLPED. 90--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Rafique, W.-T. Lim, and M. Thottethodi. 2007. Effective management of dram bandwidth in multicore processors. In Proc. PACT. 245--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. 2000. Memory access scheduling. In Proc. ISCA. 128--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Shao and B. T. Davis. 2007. A burst scheduling access reordering mechanism. In Proc. HPCA. 285--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Snavely and D. M. Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proc. ASPLOS. 234--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. 2013. MISE: Providing performance predictability and improving fairness in shared main memory systems. In Proc. HPCA. 639--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Synopsys, Inc. 2014. Platform Architect. Retrieved from http://www.synopsys.com/.Google ScholarGoogle Scholar
  26. Koenraad De Vleeschauwer. 2009. MPEG-2 Decoder User’s Guide. Retrieved from http://www.kdvelectronics.eu/mpeg2fpga/mpeg2fpga.pdf.Google ScholarGoogle Scholar
  27. A. Youssef, M. Anis, and M. Elmasry. 2006. Dynamic standby prediction for leakage tolerant microprocessor functional units. In Proc. MICRO. 371--384. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. System-Level Performance and Power Optimization for MPSoC: A Memory Access-Aware Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!