skip to main content
research-article
Public Access

ShaVe-ICE: Sharing Distributed Virtualized SPMs in Many-Core Embedded Systems

Published:05 February 2018Publication History
Skip Abstract Section

Abstract

Traditional approaches for managing software-programmable memories (SPMs) do not support sharing of distributed on-chip memory resources and, consequently, miss the opportunity to better utilize those memory resources. Managing on-chip memory resources in many-core embedded systems with distributed SPMs requires runtime support to share memory resources between various threads with different memory demands running concurrently. Runtime SPM managers cannot rely on prior knowledge about the dynamically changing mix of threads that will execute and therefore should be designed in a way that enables SPM allocations for any unpredictable mix of threads contending for on-chip memory space. This article proposes ShaVe-ICE, an operating-system-level solution, along with hardware support, to virtualize and ultimately share SPM resources across a many-core embedded system to reduce the average memory latency. We present a number of simple allocation policies to improve performance and energy. Experimental results show that sharing SPMs could reduce the average execution time of the workload up to 19.5% and reduce the dynamic energy consumed in the memory subsystem up to 14%.

References

  1. Isaac Gouy. 2016. The Computer Language Benchmarks Game. Retrieved October 31, 2016 from http://benchmarksgame.alioth.debian.org.Google ScholarGoogle Scholar
  2. Scott Robert Ladd. 2016. CoyoteBench. Retrieved October 31, 2016 from https://github.com/Microsoft/test-suite/tree/master/SingleSource/Benchmarks/CoyoteBench.Google ScholarGoogle Scholar
  3. Lluc Alvarez, Lluís Vilanova, Miquel Moreto, Marc Casas, Marc Gonzàlez, Xavier Martorell, Nacho Navarro, Eduard Ayguadé, and Mateo Valero. 2015. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 720--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Oren Avissar, Rajeev Barua, and Dave Stewart. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’01). ACM, New York, NY, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce Holton. 2013. CMSM: An efficient and effective code management for software managed multicores. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). IEEE Press, Piscataway, NJ, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ke Bai and Aviral Shrivastava. 2010. Heap data management for limited local memory (LLM) multi-core processors. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS’10). ACM, New York, NY, 317--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ke Bai and Aviral Shrivastava. 2013. Automatic and efficient heap data management for limited local memory multicore architectures. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). EDA Consortium, San Jose, CA, 593--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ke Bai, A. Shrivastava, and S. Kudchadker. 2011. Stack data management for limited local memory (LLM) multi-core processors. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’11). IEEE Computer Society, Washington, DC, 231--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the International Symposium on Hardware/Software Codesign (CODES’02). ACM, New York, NY, 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Luis Angel Bathen and Nikil Dutt. 2012. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the Annual Design Automation Conference (DAC’12). ACM, New York, NY, 447--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Luis Angel D. Bathen and Nikil D. Dutt. 2014. SPMCloud: Towards the single-chip embedded scratchpad memory-based storage cloud. ACM Trans. Des. Autom. Electron. Syst. 19, 3, Article 22 (June 2014), 45 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Luis Angel D. Bathen, Nikil D. Dutt, Alex Nicolau, and Puneet Gupta. 2012. VaMV: Variability-aware memory virtualization. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, San Jose, CA, 284--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Luis Angel D. Bathen, Nikil D. Dutt, Dongyoun Shin, and Sung-Soo Lim. 2011. SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). ACM, New York, NY, 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Butko, R. Garibotti, L. Ost, and G. Sassatelli. 2012. Accuracy evaluation of GEM5 simulator system. In Proceedings of the International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC’12). 1--7.Google ScholarGoogle Scholar
  16. Vincenzo Catania, Andrea Mineo, Salvatore Monteleone, Maurizio Palesi, and Davide Patti. 2015. Noxim: An open, extensible and cycle-accurate network on-chip simulator. In Proceedings of the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’15). 162--163.Google ScholarGoogle ScholarCross RefCross Ref
  17. Doosan Cho, Sudeep Pasricha, Ilya Issenin, Nikil Dutt, Yunheung Paek, and SunJun Ko. 2008. Compiler driven data layout optimization for regular/irregular array access patterns. In Proceedings of the ACM SIGPLAN-SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’08). ACM, New York, NY, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Doosan Cho, Sudeep Pasricha, Ilya Issenin, Nikil D. Dutt, Minwook Ahn, and Yunheung Paek. 2009. Adaptive scratch pad memory management for dynamic behavior of multimedia applications. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 4 (Apr. 2009), 554--567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jason Cong, Hui Huang, Chunyue Liu, and Yi Zou. 2011. A reuse-aware prefetching scheme for scratchpad memory. In Proceedings of the Design Automation Conference (DAC’11). ACM, New York, NY, 960--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ning Deng, Weixing Ji, Jaxin Li, and Qi Zuo. 2011. A semi-automatic scratchpad memory management framework for CMP. In Proceedings of the International Conference on Advanced Parallel Processing Technologies (APPT’11). Springer-Verlag, Berlin, 73--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4 (Dec. 2005), 521--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the Annual Design Automation Conference (DAC’04). ACM, New York, NY, 238--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lovic Gauthier, Tohru Ishihara, Hideki Takase, Hiroyuki Tomiyama, and Hiroaki Takada. 2010. Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’10). ACM, New York, NY, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC’01). IEEE Computer Society, Washington, DC, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the Annual Design Automation Conference (DAC’01). ACM, New York, NY, 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Arun Kannan, Aviral Shrivastava, Amit Pabalkar, and Jong-eun Lee. 2009. A software solution for dynamic stack management on scratch pad memory. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’09). IEEE Press, Piscataway, NJ, 612--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have your scratchpad and cache it too. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 707--719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lian Li, Lin Gao, and Jingling Xue. 2005. Memory coloring: A compiler approach for scratchpad memory management. In 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05). IEEE Computer Society, Washington, DC, 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jing Lu, Ke Bai, and A. Shrivastava. 2013. SSDM: Smart stack data management for software managed multicores (SMMs). In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’13). ACM, New York, NY, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Andrea Marongiu and Luca Benini. 2012. An openmp compiler for efficient use of distributed scratchpad memory in MPSoCs. IEEE Trans. Comput. 61, 2 (Feb. 2012), 222--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.5: A Tool to Model Large Caches. Technical Report. HP Laboratories.Google ScholarGoogle Scholar
  32. Nghi Nguyen, Angel Dominguez, and Rajeev Barua. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’05). ACM, New York, NY, 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Amit Pabalkar, Aviral Shrivastava, Arun Kannan, and Jongeun Lee. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the International Conference on High Performance Computing (HiPC’08). Springer-Verlag, Berlin, 569--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test (EDTC’97). IEEE Computer Society, Washington, DC, 7--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Shoushtari and N. Dutt. 2017. SAM: Software-assisted memory hierarchy for scalable manycore embedded systems. IEEE Embedded Systems Letters 9, 4 (2017), 109--112.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. Shrivastava, N. Dutt, J. Cai, M. Shoushtari, B. Donyanavard, and H. Tajik. 2016. Automatic management of software programmable memories in many-core architectures. IET Comput. Dig. Techn. 10, 6 (2016), 288--298.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jan Sjödin and Carl von Platen. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’01). ACM, New York, NY, 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’02). IEEE Computer Society, Washington, DC, 409--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vivy Suhendra, Chandrashekar Raghavan, and Tulika Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 401--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Vivy Suhendra, Abhik Roychoudhury, and Tulika Mitra. 2010. Scratchpad allocation for concurrent embedded software. ACM Trans. Program. Lang. Syst. 32, 4, Article 13 (Apr. 2010), 47 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Aniruddha N. Udipi, Naveen Muralimanohar, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, and Norman P. Jouppi. 2010. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Manish Verma, Stefan Steinke, and Peter Marwedel. 2003. Data partitioning for maximal scratchpad usage. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’03). ACM, New York, NY, 77--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Qin Zhao, David Koh, Syed Raza, Derek Bruening, Weng-Fai Wong, and Saman Amarasinghe. 2011. Dynamic cache contention detection in multi-threaded applications. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’11). ACM, New York, NY, 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ShaVe-ICE: Sharing Distributed Virtualized SPMs in Many-Core Embedded Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!