skip to main content
research-article

Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Published:22 April 2009Publication History
Skip Abstract Section

Abstract

This article presents the first memory allocation scheme for embedded systems having a scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. All existing memory allocation schemes for SPM require the SPM size to be known at compile time. Unfortunately, because of this constraint, the resulting executable is tied to that size of SPM and is not portable to other processor implementations having a different SPM size. Size-portable code is valuable when programs are downloaded during deployment either via a network or portable media. Code downloads are used for fixing bugs or for enhancing functionality. The presence of different SPM sizes in different devices is common because of the evolution in VLSI technology across years. The result is that SPM cannot be used in such situations with downloaded codes.

To overcome this limitation, our work presents a compiler method whose resulting executable is portable across SPMs of any size. Our technique is to employ a customized installer software, which decides the SPM allocation just before the program's first run, since the SPM size can be discovered at that time. The installer then, based on the decided allocation, modifies the program executable accordingly. The resulting executable places frequently used objects in SPM, considering both code and data for placement. To keep the overhead low, much of the preprocessing for the allocation is done at compile time. Results show that our benchmarks average a 41% speedup versus an all-DRAM allocation, while the optimal static allocation scheme, which knows the SPM size at compile time and is thus an unachievable upper-bound and is only slightly faster (45% faster than all-DRAM). Results also show that the overhead from our customized installer averages about 1.5% in code size, 2% in runtime, and 3% in compile time for our benchmarks.

References

  1. Analog Devices. 1996. ADSP-21xx 16-bit DSP Family. http://www.analog.com/processors/processors/ADSP/index.html.Google ScholarGoogle Scholar
  2. Analog Devices. 2001. SHARC ADSP-21160M 32-bit Embedded CPU. http://www.analog.com/processors/processors/sharc/index.html.Google ScholarGoogle Scholar
  3. Analog Devices. 2004. TigerSharc ADSP-TS201S 32-bit DSP. http://www.analog.com/processors/processors/tigersharc/index.html.Google ScholarGoogle Scholar
  4. Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, 318--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the International Conference on Compilers, architecture, and synthesis for embedded systems. ACM, New York, 259--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arm. 2004. ARM968E-S 32-bit Embedded Core. http://www.arm.com/products/CPUs/ARM968E-S.html.Google ScholarGoogle Scholar
  7. Atmel. 2004. Atmel AT91C140 16/32-bit Embedded CPU. http://www.atmel.com/dyn/resources/proddocuments/doc6069.pdf.Google ScholarGoogle Scholar
  8. Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded Systems. In Proceedings of the ACM 2nd International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES'01). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad based embedded systems. ACM Trans. Embed. Syst. 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bohr, M., Doyle, B., Kavalieros, J., Barlage, D., Murthy, A., Doczy, M., Rios, R., Linton, T., Arghavani, R., et al. 2002. Intels 90nm technology: Moores law and more. Document Number: {IR-TR-2002-10}.Google ScholarGoogle Scholar
  12. Cnetx. Downloadable software. http://www.cnetx.com/slideshow/.Google ScholarGoogle Scholar
  13. CodeSourcery. http://www.codesourcery.com/.Google ScholarGoogle Scholar
  14. Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edler, J. and Hill, M. 2004. Dineroiv cache simulator. http://www.cs.wisc.edu/markhill/DineroIV/.Google ScholarGoogle Scholar
  16. Hallnor, G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Handango. Downloadable software. http://www.handango.com/.Google ScholarGoogle Scholar
  18. Hennessy, J. and Patterson, D. 1996. Computer Architecture A Quantitative Approach 2nd Ed. Morgan Kaufmann, Palo Alto, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hiser, J. D. and Davidson, J. W. 2004. Embarc: an efficient memory bank assignment algorithm for retargetable compilers. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 182--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hitachi/Renesas. 2004. M32R-32192 32-bit Embedded CPU. http://documentation.renesas.com/eng/products/mpumcu/rej03b0019 32192ds.pdf.Google ScholarGoogle Scholar
  21. Hitachi/Renesas. 1999. SH7050 32-bit CPU. http://documentation.renesas.com/eng/products/mpumcu/e602121 sh7050.pdf.Google ScholarGoogle Scholar
  22. Infineon. 2001. XC-166 16-bit Embedded Family. http://www.infineon.com/cmc_upload/documents/036/812/c166sv2um.pdf.Google ScholarGoogle Scholar
  23. Intel Flash. Intel wireless flash memory (W30). http://www.intel.com/design/flcomp/datashts/290702.htm.Google ScholarGoogle Scholar
  24. Janzen, J. 2001. Calculating memory system power for DDR SDRAM. DesignLine J. 10, 2.Google ScholarGoogle Scholar
  25. Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the Design Automation Conference (DAC'01). 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. LandWare, Inc. 2003. Pocket Quicken PPC20 Manual. http://www.landware.com/downloads/MANUALS/PocketQuickenPPC20Manual.pdf.Google ScholarGoogle Scholar
  27. Moritz, C. A., Frank, M., and Amarasinghe, S. 2000. FlexCache: a framework for flexible compiler generated data caching. In Proceedings of the 2nd Workshop on Intelligent Memory Systems. Springer, Berlin Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Motorola/Freescale. 2003. Dragonball MC68SZ328 32-bit Embedded CPU. http://www.freescale.com/files/32bit/doc/factsheet/MC68SZ328FS.pdf.Google ScholarGoogle Scholar
  29. Motorola/Freescale. 2002. MPC500 32-bit MCU Family. http://www.freescale.com/files/microcontrollers/doc/factsheet/MPC500FACT.pdf.Google ScholarGoogle Scholar
  30. Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems. ACM Trans. Des. Automat. Electr. Syst. 5, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Panel, L. 2003. Compilation challenges for network processors. In Proceedings of the ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'03). ACM, New York.Google ScholarGoogle Scholar
  32. Phatware, Corp. Downloadable software. http://www.phatware.com/phatnotes/.Google ScholarGoogle Scholar
  33. PhatWare, Corp. 2006. PhatNotes Professional Edition Version 4.7 User's Guide. http://www.phatware.com/doc/PhatNotesPro.pdf.Google ScholarGoogle Scholar
  34. Shivakumar, P. and Jouppi, N. 2004. Cacti 3.2. http://research.compaq.com/wrl/people/jouppi/CACTI.html.Google ScholarGoogle Scholar
  35. Sinha, A. and Chandrakasan, A. 2001. Jouletrack: A web based tool for software energy profiling. In Proceedings of the Design Automation Conference (DAC'01). IEEE, Los Alamitos, CA, 220--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. In Proceedings of the International Workshop on Compiler and Architecture Support for Embedded Computing Systems (CASES'98). ACM, New York.Google ScholarGoogle Scholar
  37. Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. In Proceedings of the International Workshop on Compiler and Architecture Support for Embedded Computing Systems (CASES'01). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Softmaker. Downloadable software. http://www.softmaker.de.Google ScholarGoogle Scholar
  39. SoftMaker Software GmbH. 2004. Plan Maker 2004 Manual. http://www.softmaker.net/down/pm2004manualen.pdf.Google ScholarGoogle Scholar
  40. Steinke, S., Grunwal, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proceedings of the 15th International Symposium on System Synthesis (ISSS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA, 409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tiwari, V., Malik, S., and Wolfe, A. 1994. Power analysis of embedded software: a first step towards software power minimization. IEEE Trans. VLSI Syst. 437--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'03). ACM, New York, 276--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Udayakumaran, S., Dominguez, A., and Barua, R. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Los Alamitos, CA, 21264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI'04). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wehmeyer, L. and Marwedel, P. 2004. Influence of onchip scratchpad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis. ACM, New York.Google ScholarGoogle Scholar
  49. Wilton, S. and Jouppi, N. 1996. Cacti: An enhanced cache access and cycle time model. IEEE J. Solid-State Circuits.Google ScholarGoogle Scholar
  50. Xi-art. Downloadable software. http://www.xi-art.com/.Google ScholarGoogle Scholar

Index Terms

  1. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!