Abstract
This article presents a scheme for managing heap data in the local memory present in each core of a limited local memory (LLM) multicore architecture. Although managing heap data semi-automatically with software cache is feasible, it may require modifications of other thread codes. Crossthread modifications are very difficult to code and debug, and will become more complex and challenging as we increase the number of cores. In this article, we propose an intuitive programming interface, which is an automatic and scalable scheme for heap data management. Besides, for embedded applications, where the maximum heap size can be profiled, we propose several optimizations on our heap management to significantly decrease the library overheads. Our experiments on several benchmarks from MiBench executing on the Sony Playstation 3 show that our scheme is natural to use, and if we know the maximum size of heap data, our optimizations can improve application performance by an average of 14%.
- Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, NY, 259--267. Google Scholar
Digital Library
- Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Sys. 1, 1, 6--26. Google Scholar
Digital Library
- Bai, K., Lu, D., and Shrivastava, A. 2011a. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 14th International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 215--224. Google Scholar
Digital Library
- Bai, K., Shrivastava, A., and Kudchadker, S. 2011b. Stack data management for limited local memory (LLM) multi-core processors. In Proceedings of the International Conference on Application Specific Systems, Architectures and Processors. 231--234. Google Scholar
Digital Library
- Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. ACM, New York, NY, 73--78. Google Scholar
Digital Library
- Che, W. and Chatha, K. 2011a. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In Proceedings of the 48th Design Automation Conference. ACM, New York, NY, 122--127. Google Scholar
Digital Library
- Che, W. and Chatha, K. 2011b. Scheduling of stream programs onto spm enhanced processors with code overlay. In Proceedings of the 9th IEEE/ACM Symposium on Embedded Systems and Real-Time Multimedia.Google Scholar
- Che, W. and Chatha, K. S. 2010. Scheduling of synchronous data flow models on scratchpad memory based embedded processors. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). 205--212. Google Scholar
Digital Library
- Che, W., Panda, A., and Chatha, K. S. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, Belgium, 1118--1123. Google Scholar
Digital Library
- Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Embed. Comput. 1, 4, 521--540. Google Scholar
Digital Library
- Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., and Min, S. L. 2006a. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, New York, NY, 223--233. Google Scholar
Digital Library
- Egger, B., Lee, J., and Shin, H. 2006b. Scratchpad memory management for portable systems with a memory management unit. In Proceedings of the 6th ACM & IEEE International Conference on Embedded Software. ACM, New York, NY, 321--330. Google Scholar
Digital Library
- Eichenberger, A., O'Brien, J. K., O'Brien, K. M., Wu, P., Chen, T., Oden, P. H., Prener, D. A., Shepard, J. C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M. K., Archambault, R., Gao, Y., and Koo, R. 2006. Using advanced compiler technology to exploit the performance of the cell broadband engineTM architecture. IBM Syst. J. 45, 1, 59--84. Google Scholar
Digital Library
- Flachs, B., Asano, S., Dhong, S., Hofstee, H., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H.-J., Mueller, S., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N., Brokenshire, D., Peyravian, M., To, V., and Iwata, E. 2006. The microarchitecture of the synergistic processor for a cell processor. IEEE Solid-State Circuits 41, 1, 63--70.Google Scholar
Cross Ref
- Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference. ACM, New York, NY, 238--243. Google Scholar
Digital Library
- Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. Google Scholar
Digital Library
- Janapsatya, A., Ignjatović, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Conference on Asia South Pacific Design Automation. IEEE Press, Piscataway, NJ, 612--617. Google Scholar
Digital Library
- Jung, S. C., Shrivastava, A., and Bai, K. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors. 13--20.Google Scholar
- Kandemir, M., Ramanujam, J., and Choudhary, A. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the 39th Annual Design Automation Conference. ACM, New York, NY, 219--224. Google Scholar
Digital Library
- Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference. ACM, New York, NY, 690--695. Google Scholar
Digital Library
- Kannan, A., Shrivastava, A., Pabalkar, A., and Lee, J.-E. 2009. A software solution for dynamic stack management on scratch pad memory. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE Press, Piscataway, NJ, 612--617. Google Scholar
Digital Library
- Li, L., Gao, L., and Xue, J. 2005. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Washington, DC, 329--338. Google Scholar
Digital Library
- McIlroy, R., Dickman, P., and Sventek, J. 2008. Efficient dynamic heap allocation of scratch-pad memory. In Proceedings of the 7th International Symposium on Memory Management. ACM Press, New York, NY, 31--40. Google Scholar
Digital Library
- Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. 1997. A 160-mhz, 32-b, 0.5-w CMOS RISC microprocessor. Digital Tech. J. 9, 1, 49--62. Google Scholar
Digital Library
- Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, 115--125. Google Scholar
Digital Library
- Pabalkar, A., Shrivastava, A., Kannan, A., and Lee, J. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the International Conference on High Performance Computing (HiPC). Google Scholar
Digital Library
- Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proceedings of the 15th International Symposium on System Synthesis. ACM, New York, NY, 213--218. Google Scholar
Digital Library
- Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, Los Alamitos, CA, 409. Google Scholar
Digital Library
- Udayakumaran, S., Dominguez, A., and Barua, R. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. Trans. Embed. Comput. Sys. 5, 2, 472--511. Google Scholar
Digital Library
- Verma, M. and Marwedel, P. Aug. 2006. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14, 8, 802--815. Google Scholar
Digital Library
- Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marwedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia). 115--120.Google Scholar
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. Vol. 2. IEEE Computer Society, Washington, DC, 21264. Google Scholar
Digital Library
Index Terms
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors
Recommendations
Heap data management for limited local memory (LLM) multi-core processors
CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisThis paper presents a scheme to manage heap data in the local memory present in each core of a limited local memory (LLM) multi-core processor. While it is possible to manage heap data semi-automatically using software cache, managing heap data of a ...
Vector class on limited local memory (LLM) multi-core processors
CASES '11: Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systemsLimited Local Memory (LLM) multi-core architecture is a promising solution for scalable memory hierarchy. LLM architecture, e.g., IBM Cell/B.E. is a purely distributed memory architecture in which each core can directly access only its small local ...
Efficient Code Assignment Techniques for Local Memory on Software Managed Multicores
Scaling the memory hierarchy is a major challenge when we scale the number of cores in a multicore processor. Software Managed Multicore (SMM) architectures come up as one of the promising solutions. In an SMM architecture, there are no caches, and each ...






Comments