Abstract
Traditional approaches for managing software-programmable memories (SPMs) do not support sharing of distributed on-chip memory resources and, consequently, miss the opportunity to better utilize those memory resources. Managing on-chip memory resources in many-core embedded systems with distributed SPMs requires runtime support to share memory resources between various threads with different memory demands running concurrently. Runtime SPM managers cannot rely on prior knowledge about the dynamically changing mix of threads that will execute and therefore should be designed in a way that enables SPM allocations for any unpredictable mix of threads contending for on-chip memory space. This article proposes ShaVe-ICE, an operating-system-level solution, along with hardware support, to virtualize and ultimately share SPM resources across a many-core embedded system to reduce the average memory latency. We present a number of simple allocation policies to improve performance and energy. Experimental results show that sharing SPMs could reduce the average execution time of the workload up to 19.5% and reduce the dynamic energy consumed in the memory subsystem up to 14%.
- Isaac Gouy. 2016. The Computer Language Benchmarks Game. Retrieved October 31, 2016 from http://benchmarksgame.alioth.debian.org.Google Scholar
- Scott Robert Ladd. 2016. CoyoteBench. Retrieved October 31, 2016 from https://github.com/Microsoft/test-suite/tree/master/SingleSource/Benchmarks/CoyoteBench.Google Scholar
- Lluc Alvarez, Lluís Vilanova, Miquel Moreto, Marc Casas, Marc Gonzàlez, Xavier Martorell, Nacho Navarro, Eduard Ayguadé, and Mateo Valero. 2015. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 720--732. Google Scholar
Digital Library
- Oren Avissar, Rajeev Barua, and Dave Stewart. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’01). ACM, New York, NY, 34--43. Google Scholar
Digital Library
- Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce Holton. 2013. CMSM: An efficient and effective code management for software managed multicores. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). IEEE Press, Piscataway, NJ, 1--9. Google Scholar
Digital Library
- Ke Bai and Aviral Shrivastava. 2010. Heap data management for limited local memory (LLM) multi-core processors. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS’10). ACM, New York, NY, 317--326. Google Scholar
Digital Library
- Ke Bai and Aviral Shrivastava. 2013. Automatic and efficient heap data management for limited local memory multicore architectures. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). EDA Consortium, San Jose, CA, 593--598. Google Scholar
Digital Library
- Ke Bai, A. Shrivastava, and S. Kudchadker. 2011. Stack data management for limited local memory (LLM) multi-core processors. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’11). IEEE Computer Society, Washington, DC, 231--234. Google Scholar
Digital Library
- Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the International Symposium on Hardware/Software Codesign (CODES’02). ACM, New York, NY, 73--78. Google Scholar
Digital Library
- Luis Angel Bathen and Nikil Dutt. 2012. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the Annual Design Automation Conference (DAC’12). ACM, New York, NY, 447--452. Google Scholar
Digital Library
- Luis Angel D. Bathen and Nikil D. Dutt. 2014. SPMCloud: Towards the single-chip embedded scratchpad memory-based storage cloud. ACM Trans. Des. Autom. Electron. Syst. 19, 3, Article 22 (June 2014), 45 pages. Google Scholar
Digital Library
- Luis Angel D. Bathen, Nikil D. Dutt, Alex Nicolau, and Puneet Gupta. 2012. VaMV: Variability-aware memory virtualization. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, San Jose, CA, 284--287. Google Scholar
Digital Library
- Luis Angel D. Bathen, Nikil D. Dutt, Dongyoun Shin, and Sung-Soo Lim. 2011. SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). ACM, New York, NY, 79--88. Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. Google Scholar
Digital Library
- A. Butko, R. Garibotti, L. Ost, and G. Sassatelli. 2012. Accuracy evaluation of GEM5 simulator system. In Proceedings of the International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC’12). 1--7.Google Scholar
- Vincenzo Catania, Andrea Mineo, Salvatore Monteleone, Maurizio Palesi, and Davide Patti. 2015. Noxim: An open, extensible and cycle-accurate network on-chip simulator. In Proceedings of the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’15). 162--163.Google Scholar
Cross Ref
- Doosan Cho, Sudeep Pasricha, Ilya Issenin, Nikil Dutt, Yunheung Paek, and SunJun Ko. 2008. Compiler driven data layout optimization for regular/irregular array access patterns. In Proceedings of the ACM SIGPLAN-SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’08). ACM, New York, NY, 41--50. Google Scholar
Digital Library
- Doosan Cho, Sudeep Pasricha, Ilya Issenin, Nikil D. Dutt, Minwook Ahn, and Yunheung Paek. 2009. Adaptive scratch pad memory management for dynamic behavior of multimedia applications. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 4 (Apr. 2009), 554--567. Google Scholar
Digital Library
- Jason Cong, Hui Huang, Chunyue Liu, and Yi Zou. 2011. A reuse-aware prefetching scheme for scratchpad memory. In Proceedings of the Design Automation Conference (DAC’11). ACM, New York, NY, 960--965. Google Scholar
Digital Library
- Ning Deng, Weixing Ji, Jaxin Li, and Qi Zuo. 2011. A semi-automatic scratchpad memory management framework for CMP. In Proceedings of the International Conference on Advanced Parallel Processing Technologies (APPT’11). Springer-Verlag, Berlin, 73--87. Google Scholar
Digital Library
- Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4 (Dec. 2005), 521--540. Google Scholar
Digital Library
- Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the Annual Design Automation Conference (DAC’04). ACM, New York, NY, 238--243. Google Scholar
Digital Library
- Lovic Gauthier, Tohru Ishihara, Hideki Takase, Hiroyuki Tomiyama, and Hiroaki Takada. 2010. Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’10). ACM, New York, NY, 157--166. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC’01). IEEE Computer Society, Washington, DC, 3--14. Google Scholar
Digital Library
- M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the Annual Design Automation Conference (DAC’01). ACM, New York, NY, 690--695. Google Scholar
Digital Library
- Arun Kannan, Aviral Shrivastava, Amit Pabalkar, and Jong-eun Lee. 2009. A software solution for dynamic stack management on scratch pad memory. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’09). IEEE Press, Piscataway, NJ, 612--617. Google Scholar
Digital Library
- Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have your scratchpad and cache it too. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 707--719. Google Scholar
Digital Library
- Lian Li, Lin Gao, and Jingling Xue. 2005. Memory coloring: A compiler approach for scratchpad memory management. In 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05). IEEE Computer Society, Washington, DC, 329--338. Google Scholar
Digital Library
- Jing Lu, Ke Bai, and A. Shrivastava. 2013. SSDM: Smart stack data management for software managed multicores (SMMs). In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’13). ACM, New York, NY, 1--8. Google Scholar
Digital Library
- Andrea Marongiu and Luca Benini. 2012. An openmp compiler for efficient use of distributed scratchpad memory in MPSoCs. IEEE Trans. Comput. 61, 2 (Feb. 2012), 222--236. Google Scholar
Digital Library
- Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.5: A Tool to Model Large Caches. Technical Report. HP Laboratories.Google Scholar
- Nghi Nguyen, Angel Dominguez, and Rajeev Barua. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’05). ACM, New York, NY, 115--125. Google Scholar
Digital Library
- Amit Pabalkar, Aviral Shrivastava, Arun Kannan, and Jongeun Lee. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the International Conference on High Performance Computing (HiPC’08). Springer-Verlag, Berlin, 569--582. Google Scholar
Digital Library
- Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test (EDTC’97). IEEE Computer Society, Washington, DC, 7--11. Google Scholar
Digital Library
- M. Shoushtari and N. Dutt. 2017. SAM: Software-assisted memory hierarchy for scalable manycore embedded systems. IEEE Embedded Systems Letters 9, 4 (2017), 109--112.Google Scholar
Cross Ref
- A. Shrivastava, N. Dutt, J. Cai, M. Shoushtari, B. Donyanavard, and H. Tajik. 2016. Automatic management of software programmable memories in many-core architectures. IET Comput. Dig. Techn. 10, 6 (2016), 288--298.Google Scholar
Cross Ref
- Jan Sjödin and Carl von Platen. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’01). ACM, New York, NY, 15--23. Google Scholar
Digital Library
- S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’02). IEEE Computer Society, Washington, DC, 409--415. Google Scholar
Digital Library
- Vivy Suhendra, Chandrashekar Raghavan, and Tulika Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 401--410. Google Scholar
Digital Library
- Vivy Suhendra, Abhik Roychoudhury, and Tulika Mitra. 2010. Scratchpad allocation for concurrent embedded software. ACM Trans. Program. Lang. Syst. 32, 4, Article 13 (Apr. 2010), 47 pages. Google Scholar
Digital Library
- Aniruddha N. Udipi, Naveen Muralimanohar, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, and Norman P. Jouppi. 2010. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 175--186. Google Scholar
Digital Library
- Manish Verma, Stefan Steinke, and Peter Marwedel. 2003. Data partitioning for maximal scratchpad usage. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’03). ACM, New York, NY, 77--83. Google Scholar
Digital Library
- Qin Zhao, David Koh, Syed Raza, Derek Bruening, Weng-Fai Wong, and Saman Amarasinghe. 2011. Dynamic cache contention detection in multi-threaded applications. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’11). ACM, New York, NY, 27--38. Google Scholar
Digital Library
Index Terms
ShaVe-ICE: Sharing Distributed Virtualized SPMs in Many-Core Embedded Systems
Recommendations
Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation
Nonvolatile memories such as Flash memory, phase change memory (PCM), and magnetic random access memory (MRAM) have many desirable characteristics for embedded systems to employ them as main memory. However, there are two common challenges we need to ...
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
Recent advances in circuit and semiconductor technologies have pushed Non-Volatile Memory (NVM) technologies into a new era. These technologies exhibit appealing properties such as low power consumption, non-volatility, shock-resistivity, and high ...
Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation
DAC '10: Proceedings of the 47th Design Automation ConferenceRecent advances in circuit and process technologies have pushed non-volatile memory technologies into a new era. These technologies exhibit appealing properties such as low power consumption, non-volatility, shock-resistivity, and high density. However, ...






Comments