Abstract
Software applications use dynamic memory (allocated and deallocated in the system's heap) to handle dynamism in their working conditions. Embedded systems tend to include complex memory organizations but most techniques for dynamic memory management do not deal with the placement of data objects in physical memory modules. Additionally, the performance of hardware-controlled cache memories may be severely hindered when used with linked data structures. We therefore present a methodology to map dynamic data on the multilevel memory subsystem of embedded systems, taking advantage of any available memories (e.g., on-chip SRAMs) and avoiding interference with the cache memories. The resulting data placement uses an exclusive memory model and is compatible with existing techniques for managing static data. Our methodology helps the designer achieve reductions in energy consumption and execution time that can be obtained by an expert in an automated way while keeping control over the process through multiple configuration knobs.
- Mohammed Javed Absar, Francesco Poletti, Paul Marchal, Francky Catthoor, and Luca Benini. 2004. Fast and power-efficient dynamic data-layout with DMA-capable memories. In Proceedings of the PACS.Google Scholar
- Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. 2000. Tiling imperfectly-nested loop nests. In Proceedings of Supercomputing. IEEE, Washington, DC, Article 31. http://dl.acm.org/citation.cfm?id=370049.370401 Google Scholar
Digital Library
- I. Anagnostopoulos, S. Xydis, A. Bartzas, Zhonghai Lu, D. Soudris, and A. Jantsch. 2011. Custom microcoded dynamic memory management for distributed on-chip memory organizations. Embedded Systems Letters 3, 2 (June 2011), 66--69. DOI:http://dx.doi.org/10.1109/LES.2011.2146228 Google Scholar
Digital Library
- ARM. 2011. Cortex-A15 Technical Reference Manual Rev. r2p0. ARM.Google Scholar
- David Atienza, José M. Mendías, Stylianos Mamagkakis, Dimitrios Soudris, and Francky Catthoor. 2006. Systematic dynamic memory management design methodology for reduced memory footprint. ACM TODAES 11, 2 (2006), 465--489. DOI:http://dx.doi.org/10.1145/1142155.1142165 Google Scholar
Digital Library
- Oren Avissar, Rajeev Barua, and Dave Stewart. 2001. Heterogeneous memory management for embedded systems. In Proceedings of CASES. ACM, 34--43. DOI:http://dx.doi.org/10.1145/502217.502223 Google Scholar
Digital Library
- Christos Baloukas, Jose L. Risco-Martin, David Atienza, Christophe Poucet, Lazaros Papadopoulos, Stylianos Mamagkakis, Dimitrios Soudris, J. Ignacio Hidalgo, Francky Catthoor, and Juan Lanchares. 2009. Optimization methodology of dynamic data structures based on genetic algorithms for multimedia embedded systems. Journal of Systems and Software 82, 4 (2009), 590--602. DOI:http://dx.doi.org/10.1016/j.jss.2008.08.032 Google Scholar
Digital Library
- Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of CODES. ACM, 73--78. DOI:http://dx.doi.org/10.1145/774789.774805 Google Scholar
Digital Library
- Alexandros Bartzas, Miguel Peón-Quirós, Christophe Poucet, Christos Baloukas, Stylianos Mamagkakis, Francky Catthoor, Dimitrios Soudris, and Jose Manuel Mendías. 2010. Software metadata: Systematic characterization of the memory behaviour of dynamic applications. Journal of Systems and Software 83, 6 (2010), 1051--1075. DOI:http://dx.doi.org/DOI: 10.1016/j.jss.2010.01.001 Google Scholar
Digital Library
- Luca Benini and Giovanni de Micheli. 2000. System-level power optimization: Techniques and tools. ACM TODAES 5, 2 (2000), 115--192. DOI:http://dx.doi.org/10.1145/335043.335044 Google Scholar
Digital Library
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Notes 35, 11 (Nov. 2000), 117--128. DOI:http://dx.doi.org/10.1145/356989.357000 Google Scholar
Digital Library
- Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2001. Composing high-performance memory allocators. In Proceedings of PLDI. ACM, 114--124. DOI:http://dx.doi.org/10.1145/378795.378821 Google Scholar
Digital Library
- Gilles Brassard and T. Bratley. 1996. Fundamentals of Algorithmics (1st (Spanish) ed.). Prentice Hall, 227--230. Google Scholar
Digital Library
- Francky Catthoor, Sven Wuytack, G. E. de Greef, Florin Banica, Lode Nachtergaele, and Arnout Vandecappelle. 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers. Google Scholar
Digital Library
- Trishul M. Chilimbi, Bob Davidson, and James R. Larus. 1999. Cache-conscious structure definition. In Proceedings of PLDI. ACM, 13--24. DOI:http://dx.doi.org/10.1145/301618.301635 Google Scholar
Digital Library
- Minas Dasygenis, Erik Brockmeyer, Bart Durinck, Francky Catthoor, Dimitrios Soudris, and Adonios Thanailakis. 2006. A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck. IEEE TVLSI 14, 3, 279--291. Google Scholar
Digital Library
- Edgard Daylight, David Atienza, Arnout Vandecappelle, Francky Catthoor, and José Manuel Mendías. 2004. Memory-access-aware data structure transformations for embedded software with dynamic data accesses. IEEE TVLSI 12, 3 (2004), 269--280. DOI:http://dx.doi.org/10.1109/TVLSI.2004.824303 Google Scholar
Digital Library
- Hugo De Man. 2004. Connecting E-dreams to deep-submicron realities. In Proceedings of PATMOS. Springer. DOI:http://dx.doi.org/10.1007/b100662Google Scholar
- Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing 1, 4 (2005), 521--540. Google Scholar
Digital Library
- Lieven Eeckhout, H. Vandierendonck, and Koen De Bosschere. 2003. Quantifying the impact of input data sets on program behavior and its applications. Journal of Instruction-Level Parallelism 5 (2003), 1--33.Google Scholar
- Edward Fredkin. 1960. Trie memory. Communications of the ACM 3, 9 (Sept. 1960), 490--499. DOI:http://dx.doi.org/10.1145/367390.367400 Google Scholar
Digital Library
- Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. Google Scholar
Digital Library
- Bert Geelen, Erik Brockmeyer, Bart Durinck, Gauthier Lafruit, and Rudy Lauwereins. 2005. Alleviating memory bottlenecks by software-controlled data transfers in a data-parallel wavelet transform on a multicore DSP. In Proceedings of SPS-DARTS. 143--146.Google Scholar
- Stefan Valentin Gheorghita, Martin Palkovic, Juan Hamers, Arnout Vandecappelle, Stelios Mamagkakis, Twan Basten, Lieven Eeckhout, Henk Corporaal, Francky Catthoor, Frederik Vandeputte, and Koen De Bosschere. 2009. System-scenario-based design of dynamic embedded systems. ACM TODAES 14, 1 (2009), 1--45. DOI:http://dx.doi.org/10.1145/1455229.1455232 Google Scholar
Digital Library
- R. González-Alberquilla, Fernando Castro, Luis Piñuel, and Francisco Tirado. 2010. Stack filter: Reducing L1 data cache power consumption. Journal of Systems Architecture 56 (Dec. 2010), 685--695. DOI:http://dx.doi.org/10.1016/j.sysarc.2010.10.002 Google Scholar
Digital Library
- Tristan Henderson, David Kotz, and Ilya Abyzov. 2004. The changing usage of a mature campus-wide wireless network. In Proceedings of MobiCom. ACM, 187--201. DOI:http://dx.doi.org/10.1145/1023720.1023739 Google Scholar
Digital Library
- HP Labs. 2008. CACTI 5.3. Retrieved from http://quid.hpl.hp.com:9081/cacti/.Google Scholar
- Franois Ingelrest, Guillermo Barrenetxea, Gunnar Schaefer, Martin Vetterli, Olivier Couach, and Marc Parlange. 2010. SensorScope: Application-specific sensor network for environmental monitoring. ACM TOSN 6, 2 (2010), 1--32. DOI:http://dx.doi.org/10.1145/1689239.1689247 Google Scholar
Digital Library
- JEDEC. 2011. Low Power Double Data Rate 2 (LPDDR2) - JESD209-2E. JEDEC Solid State Technology Association.Google Scholar
- N. Jouppi and S. Wilton. 1994. Tradeoffs in two-level on-chip caching. In Proceedings of ISCA. IEEE, 34--45. DOI:http://dx.doi.org/10.1145/191995.192015 Google Scholar
Digital Library
- Mahmut Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu. 2004. Compiler-directed scratchpad memory optimization for embedded multiprocessors. IEEE TVLSI Systems 12, 3 (2004), 281--287. Google Scholar
Digital Library
- Mahmut Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of DAC. 690--695. DOI:http://dx.doi.org/10.1145/378239.379049 Google Scholar
Digital Library
- Chris Lattner and Vikram Adve. 2005. Automatic pool allocation: Improving performance by controlling data structure layout in the heap. In Proceedings of PLDI. ACM, 129--142. DOI:http://dx.doi.org/10.1145/1065010.1065027 Google Scholar
Digital Library
- Doug Lea. 1996. A Memory Allocator. Retrieved from http://g.oswego.edu/dl/html/malloc.html.Google Scholar
- Wentong Li, S. Mohanty, and K. Kavi. 2006. A page-based hybrid (software-hardware) dynamic memory allocator. IEEE CAL 5, 2 (2006), 13--13. Google Scholar
Digital Library
- Amy W. Lim, Shih-Wei Liao, and Monica S. Lam. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proc. of PPoPP. ACM, 103--112. DOI:http://dx.doi.org/10.1145/379539.379586 Google Scholar
Digital Library
- Stylianos Mamagkakis, David Atienza, Christophe Poucet, Francky Catthoor, and Dimitrios Soudris. 2006. Energy-efficient dynamic memory allocators at the middleware level of embedded systems. In Proceedings of EMSOFT. ACM, 215--222. DOI:http://dx.doi.org/10.1145/1176887.1176919 Google Scholar
Digital Library
- Paul Marchal, Francky Catthoor, Davide Bruni, Luca Benini, José Ignacio Gómez, and Luis Piñuel. 2004. Integrated task scheduling and data assignment for SDRAMs in dynamic applications. IEEE Design and Test of Computers 21, 5 (2004), 378--387. DOI:http://dx.doi.org/10.1109/MDT.2004.66 Google Scholar
Digital Library
- Barry H. Margolin, Richard P. Parmelee, and Martin Schatzoff. 1971. Analysis of free-storage algorithms. IBM Systems Journal 10, 4 (1971), 283--304. Google Scholar
Digital Library
- Ross McIlroy, Peter Dickman, and Joe Sventek. 2008. Efficient dynamic heap allocation of scratch-pad memory. In Proceedings of ISMM. ACM, 31--40. DOI:http://dx.doi.org/10.1145/1375634.1375640 Google Scholar
Digital Library
- MICRON. 2010. Mobile LPSDR SDRAM - MT48H32M32LF/LG Rev. D 1/11 EN. Micron Technology, Inc.Google Scholar
- MICRON. 2012. Mobile LPDDR2 SDRAM - MT42L64M32D1 Rev. N 3/12 EN. Micron Technology, Inc.Google Scholar
- Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM TODAES 5, 3 (2000), 682--704. DOI:http://dx.doi.org/10.1145/348019.348570 Google Scholar
Digital Library
- Francesco Poletti, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and José M. Mendías. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of DAC. 238--243. Google Scholar
Digital Library
- Christophe Poucet, David Atienza, and Francky Catthoor. 2006. Template-based semi-automatic profiling of multimedia applications. In Proceedings of ICME. IEEE, 1061--1064.Google Scholar
Cross Ref
- M. Shreedhar and George Varghese. 1996. Efficient fair queueing using deficit round-robin. IEEE/ACM Trans. Networking 4, 3 (1996), 375--385. DOI:http://dx.doi.org/10.1109/90.502236 Google Scholar
Digital Library
- María Soto, André Rossi, and Marc Sevaux. 2012. A mathematical model and a metaheuristic approach for a memory allocation problem. Journal of Heuristics 18, 1 (Feb. 2012), 149--167. DOI:http://dx.doi.org/10.1007/s10732-011-9165-3 Google Scholar
Digital Library
- Stefan Steinke, Lars Wehmeyer, B. Lee, and Peter Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of DATE. 409. Google Scholar
Digital Library
- S. Subha. 2009. An exclusive cache model. In IEEE ITNG. 1715--1716. DOI:http://dx.doi.org/10.1109/ITNG.2009.89 Google Scholar
Digital Library
- Sumesh Udayakumaran, Angel Dominguez, and Rajeev Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM TECS 5, 2 (2006), 472--511. DOI:http://dx.doi.org/10.1145/1151074.1151085 Google Scholar
Digital Library
- Manish Verma, Stefan Steinke, and Peter Marwedel. 2003. Data partitioning for maximal scratchpad usage. In Proceedings of ASP-DAC. 77--83. DOI:http://dx.doi.org/10.1145/1119772.1119788 Google Scholar
Digital Library
- Manish Verma, Lars Wehmeyer, and Peter Marwedel. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings of DATE. 21264. Google Scholar
Digital Library
- Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles. 1995. Dynamic storage allocation: A survey and critical review. In Proceedings of IWMM. Springer-Verlag, 1--116. Google Scholar
Digital Library
- Sven Wuytack, Jean-Philippe Diguet, Francky Catthoor, and Hugo De Man. 1998. Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. IEEE TVLSI 6, 4 (Dec. 1998), 529--537. DOI:http://dx.doi.org/10.1109/92.736124 Google Scholar
Digital Library
- Ying Zheng, B. T. Davis, and M. Jordan. 2004. Performance evaluation of exclusive cache hierarchies. In Proceedings of ISPASS. IEEE, Washington, DC, 89--96. DOI:http://dx.doi.org/10.1109/ISPASS.2004.1291359 Google Scholar
Digital Library
Index Terms
Placement of Linked Dynamic Data Structures over Heterogeneous Memories in Embedded Systems
Recommendations
A case for small row buffers in non-volatile main memories
ICCD '12: Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)DRAM-based main memories have read operations that destroy the read data, and as a result, must buffer large amounts of data on each array access to keep chip costs low. Unfortunately, system-level trends such as increased memory contention in multi-...
Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory
HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed ComputingThe emergence of many non-volatile memory (NVM) techniques is poised to revolutionize main memory systems because of the relatively high capacity and low lifetime power consumption of NVM. However, to avoid the typical limitation of NVM as the main ...
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsReplacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...






Comments