skip to main content
10.1145/1254766.1254804acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
Article

Dynamic data scratchpad memory management for a memory subsystem with an MMU

Published:13 June 2007Publication History

ABSTRACT

In this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads required global data and stack pages into the SPM on demand when a function is called. A scratchpad memory managerloads/unloads the data pages and maintains a page table for the MMU. Our approach is based on post-pass analysis and optimization techniques, and it handles the whole program including libraries. The data page mapping is determined by solving an integer linear programming (ILP) formulation that approximates our demand paging technique. The ILP model uses a dynamic call graph annotated with the number of memory accesses and/or cache misses obtained by profiling. We evaluate our technique on thirteen embedded applications. We compare the results to a reference system with a 4-way set associative data cache and the ideal case with the same 4-way cache and SPM, where all global and stack data is placed in the SPM. On average, our approach reduces the total system energy consumption by 8.1% with no performance degradation. This is equivalent to exploiting 60% of the room available in energy reduction between the reference case and the ideal case.

References

  1. Advanced Compiler Research Laboratory. Seoul National University Advanced Compiler Tool Kit (SNACK). http://aces.snu.ac.kr/snack.html, 2004.Google ScholarGoogle Scholar
  2. Federico Angiolini, Luca Benini, and Alberto Caprara. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In CASES'03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 318--326, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Federico Angiolini, Francesco Menichelli, Alberto Ferrero, Luca Benini, and Mauro Olivieri. A post-compiler approach to scratchpad mapping of code. In CASES'04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, pages 259--267, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Oren Avissar, Rajeev Barua, and Dave Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. Trans. on Embedded Computing Sys., 1(1):6--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing, 1(4), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bernhard Egger, Chihun Kim, Choonki Jang, Yoonsung Nam, Jaejin Lee, and Sang Lyul Min. A dynamic code placement technique for scratchpad memory using postpass optimization. In CASES'06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bernhard Egger, Jaejin Lee, and Heonshik Shin. Scratchpad memory management for portable systems with a memory management unit. In EMSOFT'06: Proceedings of the sixth ACM & IEEE internationel conference on Embedded Software, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John Fotheringham. Dynamic storage allocation in the atlas computer, including an automatic use of a backing store. Commun. ACM, 4(10):43--436, 1961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual Workshop on Workload Characterization, December 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lindo Systems Inc. What's Best! 8.0. http://www.lindo.com/products/wb/wbm.html, 2007.Google ScholarGoogle Scholar
  12. Intel. Intel XScale Core Developers Manual, January 2004.Google ScholarGoogle Scholar
  13. Peter Marwedel Lars Wehmeyer, Urs Helmig. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI2004), Munich, Germany, Jun 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In International Symposium on Microarchitecture, pages 33--335, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ARM Ltd. ARM926EJ-S,. http://www.arm.com/products/CPUs/ARM926EJ-S.html, 2002.Google ScholarGoogle Scholar
  16. ARM Ltd. ARM926EJ-S Technical Reference Manual, 2003.Google ScholarGoogle Scholar
  17. Micron Technology, Inc. MT48H8M16LF Mobile SDRAM. http://www.micron.com/products/dram/mobilesdram/, 2003.Google ScholarGoogle Scholar
  18. Micron Technology, Inc. Mobile SDRAM Power Calc 10. http://www.micron.com/systemcalc, 2004.Google ScholarGoogle Scholar
  19. MP3 Reference Decoder. http://www.mp3-tech.org/programmer/sources/dist10.tgz, 1996.Google ScholarGoogle Scholar
  20. Nghi Nguyen, Angel Dominguez, and Rajeev Barua. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In CASES'05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, pages 115--125, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pretty Good Privacy (PGPi). http://www.pgpi.org/, 2002.Google ScholarGoogle Scholar
  22. Samsung Semiconductor. K4X51163PC Mobile DDR SRAM. http://www.samsung.com/products/semiconductor/MobileSDRAM/, 2005.Google ScholarGoogle Scholar
  23. Aviral Shrivastava, Ilya Issenin, and Nikil Dutt. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In CASES'05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, pages 90--96, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Stefan Steinke, Nils Grunwald, Lars Wehmeyer, Rajeshwari Banakar, M. Balakrishnan, and Peter Marwedel. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In ISSS'02: Proceedings of the 15th international symposium on System Synthesis, pages 213--218, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. Assigning program and data objects to scratchpad for energy reduction. In DATE'02: Proceedings of the conference on Design, automation and test in Europe, page 409, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sumesh Udayakumaran and Rajeev Barua. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In CASES'03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 276--286, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Manish Verma, Stefan Steinke, and Peter Marwedel. Data partitioning for maximal scratchpad usage. In Proceedings of the 2003 conference on Asia South Pacific design automation, KitaKyushu, Japan, Jan 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Manish Verma, Lars Wehmeyer, and Peter Marwedel. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, Stockholm, Sweden, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Steven J. E. Wilton and Norman P. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE Journal of Solid State Circuits, 31(5):677--688, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  30. Xvid MPEG-4 Video Codec. http://www.xvid.org, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Dynamic data scratchpad memory management for a memory subsystem with an MMU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
        June 2007
        258 pages
        ISBN:9781595936325
        DOI:10.1145/1254766
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 42, Issue 7
          Proceedings of the 2007 LCTES conference
          July 2007
          241 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1273444
          Issue’s Table of Contents

        Copyright © 2007 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate116of438submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!