skip to main content
research-article

A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

Published:05 September 2013Publication History
Skip Abstract Section

Abstract

This article presents a scheme for managing heap data in the local memory present in each core of a limited local memory (LLM) multicore architecture. Although managing heap data semi-automatically with software cache is feasible, it may require modifications of other thread codes. Crossthread modifications are very difficult to code and debug, and will become more complex and challenging as we increase the number of cores. In this article, we propose an intuitive programming interface, which is an automatic and scalable scheme for heap data management. Besides, for embedded applications, where the maximum heap size can be profiled, we propose several optimizations on our heap management to significantly decrease the library overheads. Our experiments on several benchmarks from MiBench executing on the Sony Playstation 3 show that our scheme is natural to use, and if we know the maximum size of heap data, our optimizations can improve application performance by an average of 14%.

References

  1. Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, NY, 259--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Sys. 1, 1, 6--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bai, K., Lu, D., and Shrivastava, A. 2011a. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 14th International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bai, K., Shrivastava, A., and Kudchadker, S. 2011b. Stack data management for limited local memory (LLM) multi-core processors. In Proceedings of the International Conference on Application Specific Systems, Architectures and Processors. 231--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. ACM, New York, NY, 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Che, W. and Chatha, K. 2011a. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In Proceedings of the 48th Design Automation Conference. ACM, New York, NY, 122--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Che, W. and Chatha, K. 2011b. Scheduling of stream programs onto spm enhanced processors with code overlay. In Proceedings of the 9th IEEE/ACM Symposium on Embedded Systems and Real-Time Multimedia.Google ScholarGoogle Scholar
  8. Che, W. and Chatha, K. S. 2010. Scheduling of synchronous data flow models on scratchpad memory based embedded processors. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). 205--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Che, W., Panda, A., and Chatha, K. S. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, Belgium, 1118--1123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Embed. Comput. 1, 4, 521--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., and Min, S. L. 2006a. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, New York, NY, 223--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Egger, B., Lee, J., and Shin, H. 2006b. Scratchpad memory management for portable systems with a memory management unit. In Proceedings of the 6th ACM & IEEE International Conference on Embedded Software. ACM, New York, NY, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eichenberger, A., O'Brien, J. K., O'Brien, K. M., Wu, P., Chen, T., Oden, P. H., Prener, D. A., Shepard, J. C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M. K., Archambault, R., Gao, Y., and Koo, R. 2006. Using advanced compiler technology to exploit the performance of the cell broadband engineTM architecture. IBM Syst. J. 45, 1, 59--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Flachs, B., Asano, S., Dhong, S., Hofstee, H., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H.-J., Mueller, S., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N., Brokenshire, D., Peyravian, M., To, V., and Iwata, E. 2006. The microarchitecture of the synergistic processor for a cell processor. IEEE Solid-State Circuits 41, 1, 63--70.Google ScholarGoogle ScholarCross RefCross Ref
  15. Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference. ACM, New York, NY, 238--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Janapsatya, A., Ignjatović, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Conference on Asia South Pacific Design Automation. IEEE Press, Piscataway, NJ, 612--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jung, S. C., Shrivastava, A., and Bai, K. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors. 13--20.Google ScholarGoogle Scholar
  19. Kandemir, M., Ramanujam, J., and Choudhary, A. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the 39th Annual Design Automation Conference. ACM, New York, NY, 219--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference. ACM, New York, NY, 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kannan, A., Shrivastava, A., Pabalkar, A., and Lee, J.-E. 2009. A software solution for dynamic stack management on scratch pad memory. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE Press, Piscataway, NJ, 612--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Li, L., Gao, L., and Xue, J. 2005. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Washington, DC, 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. McIlroy, R., Dickman, P., and Sventek, J. 2008. Efficient dynamic heap allocation of scratch-pad memory. In Proceedings of the 7th International Symposium on Memory Management. ACM Press, New York, NY, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. 1997. A 160-mhz, 32-b, 0.5-w CMOS RISC microprocessor. Digital Tech. J. 9, 1, 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pabalkar, A., Shrivastava, A., Kannan, A., and Lee, J. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the International Conference on High Performance Computing (HiPC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proceedings of the 15th International Symposium on System Synthesis. ACM, New York, NY, 213--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, Los Alamitos, CA, 409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Udayakumaran, S., Dominguez, A., and Barua, R. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. Trans. Embed. Comput. Sys. 5, 2, 472--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Verma, M. and Marwedel, P. Aug. 2006. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14, 8, 802--815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marwedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia). 115--120.Google ScholarGoogle Scholar
  32. Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. Vol. 2. IEEE Computer Society, Washington, DC, 21264. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!