Abstract
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache with a scratchpad memory (SPM) and a small minicache. Serializing the address translation with the actual memory access enables the memory system to access either only the SPM or the minicache. Independent of the SPM size and based solely on profiling information, a postpass optimizer classifies the code of an application binary into a pageable and a cacheable code region. The latter is placed at a fixed location in the external memory and cached by the minicache. The former, the pageable code region, is copied on demand to the SPM before execution. Both the pageable code region and the SPM are logically divided into pages the size of an MMU memory page. Using the MMU's pagefault exception mechanism, a runtime scratchpad memory manager (SPMM) tracks page accesses and copies frequently executed code pages to the SPM before they get executed. In order to minimize the number of page transfers from the external memory to the SPM, good code placement techniques become more important with increasing sizes of the MMU pages. We discuss code-grouping techniques and provide an analysis of the effect of the MMU's page size on execution time, energy consumption, and external memory accesses. We show that by using the data cache as a victim buffer for the SPM, significant energy savings are possible. We evaluate our SPM allocation strategy with fifteen applications, including H.264, MP3, MPEG-4, and PGP. The proposed memory system requires 8% less die are compared to a fully-cached configuration. On average, we achieve a 31% improvement in runtime performance and a 35% reduction in energy consumption with an MMU page size of 256 bytes.
- Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In CASES '03: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 318--326. Google Scholar
Digital Library
- Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In CASES '04: Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 259--267. Google Scholar
Digital Library
- ARM926EJ-S 2002. ARM926EJ-S Jazelle-enhanced macrocell,. http://www.arm.com/products/CPUs/ARM926EJ-S.html.Google Scholar
- ARMv6 2002. ARM Architecture Version 6 (ARMv6),. http://www.arm.com.Google Scholar
- Banakar, R., Steinke, S., Lee, B.-S., M.Balakrishnan, and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proc. of the 10th International Symposium on Hardware/Software Codesign (CODES). Google Scholar
Digital Library
- Cho, H., Egger, B., Lee, J., and Shin, H. 2007. Dynamic data scratchpad memory management for a memory subsystem with an mmu. In LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools. ACM Press, New York. 195--206. Google Scholar
Digital Library
- Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1990. Introduction to Algorithms. The MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Denning, P. J. 1967. The working set model for program behavior. In SOSP '67: Proceedings of the 1st ACM Symposium on Operating System Principles. ACM Press, New York. 15.1--15.12. Google Scholar
Digital Library
- Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing 1, 4. Google Scholar
Digital Library
- Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., and Min, S. L. 2006a. A dynamic code placement technique for scratchpad memory using postpass optimization. In CASES '06: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. Google Scholar
Digital Library
- Egger, B., Lee, J., and Shin, H. 2006b. Scratchpad memory management for portable systems with a memory management unit. In EMSOFT'06: Proceedings of the 6th ACM & IEEE Internationel Conference on Embedded Software. Google Scholar
Digital Library
- Fotheringham, J. 1961. Dynamic storage allocation in the atlas computer, including an automatic use of a backing store. Commun. ACM 4, 10, 435--436. Google Scholar
Digital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 1998. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual Workshop on Workload Characterization. Google Scholar
Digital Library
- H.264 2003. H.264 Video Codec. http://www.itu.int/rec/T-REC-H.264.Google Scholar
- Intel IXP Network Processor 2002. The Intel IXP Network Processor,. http://developer.intel.com/technology/itj/2002/volume06issue03/.Google Scholar
- Intel XScale 2002. Intel XScale Architecture. http://www.intel.com.Google Scholar
- Janapsatya, A., Ignjatovic;, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In ASP-DAC '06: Proceedings of the 2006 Conference on Asia South Pacific Design Automation. 612--617. Google Scholar
Digital Library
- Kandemir, M. and Choudhary, A. 2002. Compiler-directed scratch pad memory hierarchy design and management. In DAC '02: Proceedings of the 39th Conference on Design Automation. 628--633. Google Scholar
Digital Library
- Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In DAC '01: Proceedings of the 38th Conference on Design Automation. 690--695. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In International Symposium on Microarchitecture. 330--335. Google Scholar
Digital Library
- Li, L., Gao, L., and Xue, J. 2005. Memory coloring: A compiler approach for scratchpad memory management. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). 329--338. Google Scholar
Digital Library
- Machanick, P., Salverda, P., and Pompe, L. 1998. Hardware-software trade-offs in a direct rambus implementation of the rampage memory hierarchy. In ASPLOS-VIII: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. 105--114. Google Scholar
Digital Library
- Micron Technology, Inc. 2003. MT48H8M16LF Mobile SDRAM.Google Scholar
- Micron Technology, Inc. 2004. Mobile SDRAM Power Calc 10.Google Scholar
- Moussouris, J., Crudele, L., Freitas, D., Hansen, C., Hudson, E., Przybylski, S., Riordan, T., and Rowen, C. 1986. A cmos risc processor with integrated system functions. In COMPCON.Google Scholar
- MP3 1996. MP3 Reference Decoder. http://www.mp3-tech.org/programmer/sources/dist10.tgz.Google Scholar
- Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In CASES '05: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 115--125. Google Scholar
Digital Library
- Panda, P. R., Dutt, N. D., and Nicolau, A. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In European Design Automation and Test Conference. Google Scholar
Digital Library
- Park, C., Lim, J., Kwon, K., Lee, J., and Min, S. L. 2004. Compiler-assisted demand paging for embedded systems with flash memory. In EMSOFT'04: The ACM Conference on Embedded Software. Google Scholar
Digital Library
- Pettis, K. and Hansen, R. C. 1990. Profile guided code positioning. In PLDI '90: Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation. 16--27. Google Scholar
Digital Library
- PGPi. 2002. Pretty Good Privacy (PGPi). http://www.pgpi.org/.Google Scholar
- Philips LPC3180 2006. Philips LPC3180 microcontroller. http://www.standardics.philips.com/.Google Scholar
- Poletti, F., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run-time scratchpad management. In DAC '04: Proceedings of the 41st Annual Conference on Design Automation. 238--243. Google Scholar
Digital Library
- Samsung Semiconductor. 2005. K4X51163PC Mobile DDR SRAM.Google Scholar
- Shrivastava, A., Issenin, I., and Dutt, N. 2005. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In CASES '05: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 90--96. Google Scholar
Digital Library
- SNACK 2004. Seoul National University Advanced Compiler Tool Kit. http://aces.snu.ac.kr/snack.html.Google Scholar
- Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In ISSS '02: Proceedings of the 15th International Symposium on System Synthesis. 213--218. Google Scholar
Digital Library
- Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In CASES '03: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 276--286. Google Scholar
Digital Library
- Verma, M., Wehmeyer, L., and Marvedel, P. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings of International Conference on Design, Automation and Test in Europe (DATE). Google Scholar
Digital Library
- Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marvedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In 3rd Workshop on Embedded Systems for Real-Time Multimedia.Google Scholar
- Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5, 677--688.Google Scholar
Cross Ref
- Xvid 2005. Xvid MPEG-4 Video Codec. http://www.xvid.org.Google Scholar
Index Terms
Dynamic scratchpad memory management for code in portable systems with an MMU
Recommendations
Dynamic data scratchpad memory management for a memory subsystem with an MMU
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIn this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads ...
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded softwareIn this paper,we present a dynamic scratchpad memory allocation strategy targeting a horizontally partitioned memory subsystem for contemporary embedded processors. The memory subsystem is equipped with a memory management unit (MMU), and physically ...
Scratchpad memory management in a multitasking environment
EMSOFT '08: Proceedings of the 8th ACM international conference on Embedded softwareThis paper presents a dynamic scratchpad memory (SPM) code allocation technique for embedded systems running an operating system with preemptive multitasking. Existing SPM allocation schemes do not support multiple tasks or only a fixed number of ...






Comments