Abstract
Cache locking is a cache management technique to preclude the replacement of locked cache contents. Cache locking is often adopted to improve cache access predictability in Worst-Case Execution Time (WCET) analysis. Static cache locking methods have been proposed recently to improve Average-Case Execution Time (ACET) performance. This article presents an approach, Branch Prediction-directed Dynamic Cache Locking (BPDCL), to improve system performance through cache conflict miss reduction. In the proposed approach, the control flow graph of a program is first partitioned into disjoint execution regions, then memory blocks worth locking are determined by calculating the locking profit for each region. These two steps are conducted during compilation time. At runtime, directed by branch predictions, locking routines are prefetched into a small high-speed buffer. The predetermined cache locking contents are loaded and locked at specific execution points during program execution. Experimental results show that the proposed BPDCL method exhibits an average improvement of 25.9%, 13.8%, and 8.0% on cache miss rate reduction in comparison to cases with no cache locking, the static locking method, and the dynamic locking method, respectively.
- Kapil Anand and Rajeev Barua. 2009. Instruction cache locking inside a binary rewriter. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09). ACM Press, New York, 185--194. Google Scholar
Digital Library
- ARM. 2000. ARM940T technical reference manual. http://infocenter.arm.com/help/topic/com.arm.doc.ddi 0144b/940T_TRM_S.pdf.Google Scholar
- Alexis Arnaud and Isabelle Puaut. 2006. Dynamic instruction cache locking in hard real-time systems. In Proceedings of the 14th International Conference on Real-Time and Network Systems (RNTS'06). 179--188.Google Scholar
- Thomas Ball and James R. Larus. 1993. Branch prediction for free. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'93). ACM Press, New York, 300--313. Google Scholar
Digital Library
- Bryan Buck and Jeffrey K. Hollingsworth. 2000. An api for runtime code patching. Int. J. High Perform. Comput. Appl. 14, 317--329. Google Scholar
Digital Library
- Brian Everitt, Sabine Landau, and Morven Leese. 2001. Cluster Analysis. Arnold. Google Scholar
Digital Library
- Nikolas Gloy and Michael D. Smith. 1999. Procedure placement using temporal-ordering information. ACM Trans. Program. Lang. Syst. 21, 5, 977--1027. Google Scholar
Digital Library
- Matthew R. Guthaus, Jeffery S. Ringenberg, Daniel Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14. Google Scholar
Digital Library
- Amir H. Hashemi, David R. Kaeli, and Brad Calder. 1997. Efficient procedure mapping using cache line coloring. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'97). ACM Press, New York, 171--182. Google Scholar
Digital Library
- Mark D. Hill and Alan Jay Smith. 1989. Evaluating associativity in cpu caches. IEEE Trans. Comput. 38, 12, 1612--1630. Google Scholar
Digital Library
- Hideya Kawaji, Yosuke Yamaguchi, Hideo Matsuda, and Akihiro Hashimoto. 2001. A graph-based clustering method for a large set of sequences using a graph partitioning algorithm. Genome Inf. 12, 93--102.Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'04). IEEE Computer Society, 75--86. Google Scholar
Digital Library
- Yun Liang and Tulika Mitra. 2010. Instruction cache locking using temporal reuse profile. In Proceedings of the 47th Design Automation Conference (DAC'10). ACM Press, New York, 344--349. Google Scholar
Digital Library
- Tiantian Liu, Minming Li, and Chun Jason Xue. 2009. Minimizing wcet for real-time embedded systems via static instruction cache locking. In Proceedings of the 15th Real-Time and Embedded Technology and Applications Symposium (RTAS'09). IEEE. 35--44. Google Scholar
Digital Library
- Tiantian Liu, Minming Li, and Chun Jason Xue. 2012. Instruction cache locking for embedded systems using probability profile. J. Signal Process. Syst. Signal Image Video Technol. 69, 16, 173--188. Google Scholar
Digital Library
- Scott McFarling. 1989. Program optimization for instruction caches. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'89). ACM Press, New York, 183--191. Google Scholar
Digital Library
- MIPS. 2001. MIPS32 architecture for programmers volume ii: The mips32 instruction set. http://www.mips. com/.Google Scholar
- MPC. 2006. MPC8XX performance-driven optimization of caches and mmu configuration. http://www. freescale.com/files/32bit/doc/appnote/AN3066.pdf.Google Scholar
- Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. 2004. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'04). IEEE Computer Society, 81--92. Google Scholar
Digital Library
- Karl Pettis, Robert C. Hansen, and Jack W. Davidson. 2004. Profile guided code positioning. SIGPLAN Not. 39, 4, 398--411. Google Scholar
Digital Library
- Youfeng Wu and James R. Larus. 1994. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO'94). ACM Press, New York, 1--11. Google Scholar
Digital Library
- XSCALE. 2007. 3rd generation intel xscale microarchirtecture. http://download.intel.com/design/intel xscale/31628302.pdf.Google Scholar
- Chuanjun Zhang, Frank Vahid, and Walid Najjar. 2003. A highly configurable cache architecture for embedded systems. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). ACM Press, New York, 136--146. Google Scholar
Digital Library
Recommendations
Instruction-Cache Locking for Improving Embedded Systems Performance
Special Issue on Embedded Platforms for Crypto and Regular PapersCache memories in embedded systems play an important role in reducing the execution time of applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, improving the runtime over a ...
An efficient direct mapped instruction cache for application-specific embedded systems
CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisCaches may consume half of a microprocessor's total power and cache misses incur accessing off-chip memory, which is both time consuming and energy costly. Therefore, minimizing cache power consumption and reducing cache misses are important to reduce ...
Combining Prefetch with Instruction Cache Locking in Multitasking Real-Time Systems
RTCSA '10: Proceedings of the 2010 IEEE 16th International Conference on Embedded and Real-Time Computing Systems and ApplicationsIn multitasking real-time systems it is required to compute the WCET of each task and also the effects of interferences between tasks in the worst case. This is complex with variable latency hardware usually found in the fetch path of commercial ...






Comments