Abstract
Scratchpad memory (SPM) is a promising on-chip memory choice in real-time and cyber-physical systems where timing is of the utmost importance. SPM has time-predictable characteristics since its data movement between the SPM and the main memory is entirely managed by software. One way of such management is dynamic management. In dynamic management of instruction SPMs, code blocks are dynamically copied from the main memory to the SPM at runtime by executing direct memory access (DMA) instructions. Code management techniques try to minimize the overhead of DMA operations by finding an allocation scheme that leads to efficient utilization. In this article, we present three function-level code management techniques. These techniques perform allocation at the granularity of functions, with the objective of minimizing the impact of DMA overhead to the worst-case execution time (WCET) of a given program. The first technique finds an optimal mapping of each function to a region using integer linear programming (ILP), whereas the second technique is a polynomial-time heuristic that is suboptimal. The third technique maps functions directly to SPM addresses, not using regions, which can further reduce the WCET. Based on ILP, it can also find an optimal mapping. We evaluate our techniques using the Mälardalen WCET suite, MiBench suite, and proprietary automotive applications from industry. The results show that our techniques can significantly reduce the WCET estimates compared to caches with the state-of-the-art cache analysis.
- Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1 (Nov. 2002), 6--26. Google Scholar
Digital Library
- Philip Axer, Rolf Ernst, Heiko Falk, Alain Girault, Daniel Grund, Nan Guan, Bengt Jonsson, Peter Marwedel, Jan Reineke, Christine Rochange, Maurice Sebastian, Reinhard Von Hanxleden, Reinhard Wilhelm, and Wang Yi. 2014. Building timing predictable embedded systems. ACM Trans. Embed. Comput. Syst. 13, 4, Article 82 (March 2014), 37 pages.Google Scholar
Digital Library
- Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce Holton. 2013. CMSM: An efficient and effective code management for software managed multicores. In Proc. of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 1--9.Google Scholar
Cross Ref
- Michael A. Baker, Amrit Panda, Nikhil Ghadge, Aniruddha Kadne, and Karam S. Chatha. 2010. A performance model and code overlay generator for scratchpad enhanced embedded processors. In Proc. of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 287--296. Google Scholar
Digital Library
- Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proc. of International Symposium on Hardware/Software Codesign (CODES). 73--78. Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. Google Scholar
Digital Library
- Stephen P. Bradley, Arnoldo C. Hax, and Thomas L. Magnanti. 1977. Applied Mathematical Programming. Addison-Wesley Publishing Company.Google Scholar
- Giorgio C. Buttazzo. 2011. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications (3rd ed.). Springer Publishing Company. Google Scholar
Cross Ref
- Christoph Cullmann. 2013. Cache persistence analysis: Theory and practice. ACM Trans. Embed. Comput. Syst. 12, 1s, Article 40 (March 2013), 25 pages.Google Scholar
Digital Library
- Jean-Francois Deverge and Isabelle Puaut. 2007. WCET-directed dynamic scratchpad memory allocation of data. In Proc. of Euromicro Conference on Real-Time Systems (ECRTS). 179--190. Google Scholar
Digital Library
- Huping Ding, Yun Liang, and T. Mitra. 2014. WCET-centric dynamic instruction cache locking. In Proc. of Design, Automation 8 Test in Europe (DATE). 1--6.Google Scholar
- Bernhard Egger, Chihun Kim, Choonki Jang, Yoonsung Nam, Jaejin Lee, and Sang Lyul Min. 2006. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proc. of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). 223--233. Google Scholar
Digital Library
- Heiko Falk and Jan C. Kleinsorge. 2009. Optimal static WCET-aware scratchpad allocation of program code. In Proc. of Design Automation Conference (DAC). 732--737. Google Scholar
Digital Library
- Heiko Falk and Helena Kotthaus. 2011. WCET-driven cache-aware code positioning. In Proc. of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). 145--154. Google Scholar
Digital Library
- Christian Ferdinand. 2004. Worst-case execution time prediction by static program analysis. In Proc. of International Parallel and Distributed Processing Symposium (IPDPS). 125--127. Google Scholar
Cross Ref
- Christian Ferdinand and Reinhard Wilhelm. 1999. Efficient and precise cache behavior prediction for real-time systems. Real-Time Syst. 17, 2 (Nov. 1999), 131--181. Google Scholar
Digital Library
- Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. 2015. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv. 48, 2, Article 32 (Nov. 2015), 36 pages.Google Scholar
- Nan Guan, Mingsong Lv, Wang Yi, and Ge Yu. 2012. WCET analysis with MRU caches: Challenging LRU for predictability. In Proc. of Real-Time and Embedded Technology and Applications Symposium (RTAS). 55--64. Google Scholar
Digital Library
- Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The Mälardalen WCET benchmarks - Past, present and future. In Proc. of International Workshop on Worst-Case Execution time Analysis (WCET). 136--146.Google Scholar
- Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proc. of International Workshop on Workload Characterization (WWC). 3--14. Google Scholar
Cross Ref
- Benedikt Huber, Stefan Hepp, and Martin Schoeberl. 2014. Scope-based method cache analysis. In Proc. of International Workshop on Worst-Case Execution time Analysis (WCET), Vol. 39. 73--82.Google Scholar
- Andhi Janapsatya, Aleksandar Ignjatović, and Sri Parameswaran. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proc. of Asia and South Pacific Design Automation Conference (ASPDAC). 612--617.Google Scholar
- Seung Chul Jung, Aviral Shrivastava, and Ke Bai. 2010. Dynamic code mapping for limited local memory systems. In Proc. of International Conference on Application-specific Systems (ASAP). 13--20. Google Scholar
Cross Ref
- James A. Kahle, Michael N. Day, H. Peter Hofstee, Charles R. Johns, Theodore R. Maeurer, and David Shippy. 2005. Introduction to the cell multiprocessor. IBM J. Res. Dev. 49, 4/5 (July 2005), 589--604. Google Scholar
Cross Ref
- Mahmut Kandemir and Alok Choudhary. 2002. Compiler-directed scratch pad memory hierarchy design and management. In Proc. of Design Automation Conference (DAC). 628--633.Google Scholar
- Uday Khedker, Amitabha Sanyal, and Bageshri Karkare. 2009. Data Flow Analysis: Theory and Practice. CRC Press. Google Scholar
Cross Ref
- Hokeun Kim, David Broman, Edward A. Lee, Michael Zimmer, Aviral Shrivastava, and Junkwang Oh. 2015. A predictable and command-level priority-based DRAM controller for mixed-criticality systems. In Proc. of Real-Time and Embedded Technology and Applications Symposium (RTAS). 317--326.Google Scholar
Cross Ref
- Youngbin Kim, Jian Cai, Yooseong Kim, Kyoungwoo Lee, and Aviral Shrivastava. 2016. Splitting functions in code management on scratchpad memories. In Proc. of International Conference on Computer-Aided Design (ICCAD). 1--8. Google Scholar
Digital Library
- Edward A. Lee. 2008. Cyber physical systems: Design challenges. In Proc. of International Symposium on Real-Time Computing (ISORC). 363--369. Google Scholar
Digital Library
- Fuyang Li, Mengying Zhao, and C. J. Xue. 2015. C3: Cooperative code positioning and cache locking for WCET minimization. In Proc. of International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). 51--59. Google Scholar
Digital Library
- Isaac Liu, Jan Reineke, David Broman, Michael Zimmer, and Edward A. Lee. 2012. A PRET microarchitecture implementation with repeatable timing and competitive performance. In Proc. of International Conference on Computer Design (ICCD). 87--93. Google Scholar
Digital Library
- Tiantian Liu, Yingchao Zhao, Minming Li, and C. J. Xue. 2010. Task assignment with cache partitioning and locking for WCET minimization on MPSoC. In Proc. of International Conference on Parallel Processing (ICPP). 573--582. Google Scholar
Digital Library
- Amit Pabalkar, Aviral Shrivastava, Arun Kannan, and Jongeun Lee. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proc. of the IEEE Conference on High Performance Computing (HiPC). 569--582. Google Scholar
Cross Ref
- Sascha Plazar, Jan C. Kleinsorge, Peter Marwedel, and Heiko Falk. 2012. WCET-aware static locking of instruction caches. In Proc. of International Symposium on Code Generation and Optimization (CGO). 44--52. Google Scholar
Digital Library
- Aayush Prakash and Hiren D. Patel. 2012. An instruction scratchpad memory allocation for the precision timed architecture. In Proc. of Design, Automation 8 Test in Europe (DATE). 659--664. Google Scholar
Cross Ref
- Isabelle Puaut and Christophe Pais. 2007. Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison. In Proc. of Design, Automation 8 Test in Europe (DATE). 1484--1489. Google Scholar
Cross Ref
- Stefan Steinke, Nils Grunwald, Lars Wehmeyer, Rajeshwari Banakar, M. Balakrishnan, and Peter Marwedel. 2002a. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proc. of International Symposium on System Synthesis (ISSS). 213--218. Google Scholar
Digital Library
- Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proc. of Design, Automation 8 Test in Europe (DATE). 409--415. Google Scholar
Cross Ref
- Vivy Suhendra and Tulika Mitra. 2008. Exploring locking 8 partitioning for predictable shared caches on multi-cores. In Proc. of Design Automation Conference (DAC). 300--303. Google Scholar
Digital Library
- Vivy Suhendra, Tulika Mitra, Abhik Roychoudhury, and Ting Chen. 2005. WCET centric data allocation to scratchpad memory. In Proc. of Real-Time Systems Symposium (RTSS). 223--232. Google Scholar
Digital Library
- Vivy Suhendra, Abhik Roychoudhury, and Tulika Mitra. 2010. Scratchpad allocation for concurrent embedded software. ACM Trans. Program. Lang. Syst. 32, 4, Article 13 (April 2010), 47 pages.Google Scholar
Digital Library
- Sumesh Udayakumaran, Angel Dominguez, and Rajeev Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2 (May 2006), 472--511. Google Scholar
Digital Library
- Junhyung Um and Taewhan Kim. 2003. Code placement with selective cache activity minimization for embedded real-time software design. In Proc. of International Conference on Computer-Aided Design (ICCAD). 197--200.Google Scholar
- Manish Verma, Lars Wehmeyer, and Peter Marwedel. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proc. of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 104--109. Google Scholar
Digital Library
- Qing Wan, Hui Wu, and Jingling Xue. 2012. WCET-aware data selection and allocation for scratchpad memory. In Proc. of Languages, Compilers, and Tools for Embedded Systems (LCTES). 41--50.Google Scholar
Digital Library
- Jack Whitham and Neil Audsley. 2009. Implementing time-predictable load and store operations. In Proc. of International Conference on Embedded Software (EMSOFT). 265--274. Google Scholar
Digital Library
- Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenström. 2008. The worst-case execution-time problem-overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3, Article 36 (May 2008), 53 pages.Google Scholar
Digital Library
- Hui Wu, Jingling Xue, and Sri Parameswaran. 2010. Optimal WCET-aware code selection for scratchpad memory. In Proc. of International Conference on Embedded Software (EMSOFT). 59--68. Google Scholar
Digital Library
- Michael Zimmer, David Broman, Chris Shaver, and Edward A. Lee. 2014. FlexPRET: A processor platform for mixed-criticality systems. In Proc. of Real-Time and Embedded Technology and Applications Symposium (RTAS). 101--110. Google Scholar
Cross Ref
Index Terms
WCET-Aware Function-Level Dynamic Code Management on Scratchpad Memory
Recommendations
Endurance-Aware Allocation of Data Variables on NVM-Based Scratchpad Memory in Real-Time Embedded Systems
Nonvolatile memory (NVM) has many benefits compared to the traditional static RAM, such as improved reliability and reduced power consumption, but it has long write latency and limited write endurance. Scratchpad memory (SPM) is software-managed small on-...
WCET-Aware Energy-Efficient Data Allocation on Scratchpad Memory for Real-Time Embedded Systems
Scratchpad memory (SPM) is a software-managed, small, on-chip form of memory. For real-time embedded systems, worst case execution time (WCET) is more important than average-case performance. We address the problem of allocating program data variables ...
Dynamic scratchpad memory management for code in portable systems with an MMU
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache ...






Comments