skip to main content
research-article

Branch Prediction-Directed Dynamic Instruction Cache Locking for Embedded Systems

Published:06 October 2014Publication History
Skip Abstract Section

Abstract

Cache locking is a cache management technique to preclude the replacement of locked cache contents. Cache locking is often adopted to improve cache access predictability in Worst-Case Execution Time (WCET) analysis. Static cache locking methods have been proposed recently to improve Average-Case Execution Time (ACET) performance. This article presents an approach, Branch Prediction-directed Dynamic Cache Locking (BPDCL), to improve system performance through cache conflict miss reduction. In the proposed approach, the control flow graph of a program is first partitioned into disjoint execution regions, then memory blocks worth locking are determined by calculating the locking profit for each region. These two steps are conducted during compilation time. At runtime, directed by branch predictions, locking routines are prefetched into a small high-speed buffer. The predetermined cache locking contents are loaded and locked at specific execution points during program execution. Experimental results show that the proposed BPDCL method exhibits an average improvement of 25.9%, 13.8%, and 8.0% on cache miss rate reduction in comparison to cases with no cache locking, the static locking method, and the dynamic locking method, respectively.

References

  1. Kapil Anand and Rajeev Barua. 2009. Instruction cache locking inside a binary rewriter. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09). ACM Press, New York, 185--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM. 2000. ARM940T technical reference manual. http://infocenter.arm.com/help/topic/com.arm.doc.ddi 0144b/940T_TRM_S.pdf.Google ScholarGoogle Scholar
  3. Alexis Arnaud and Isabelle Puaut. 2006. Dynamic instruction cache locking in hard real-time systems. In Proceedings of the 14th International Conference on Real-Time and Network Systems (RNTS'06). 179--188.Google ScholarGoogle Scholar
  4. Thomas Ball and James R. Larus. 1993. Branch prediction for free. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'93). ACM Press, New York, 300--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bryan Buck and Jeffrey K. Hollingsworth. 2000. An api for runtime code patching. Int. J. High Perform. Comput. Appl. 14, 317--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brian Everitt, Sabine Landau, and Morven Leese. 2001. Cluster Analysis. Arnold. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nikolas Gloy and Michael D. Smith. 1999. Procedure placement using temporal-ordering information. ACM Trans. Program. Lang. Syst. 21, 5, 977--1027. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matthew R. Guthaus, Jeffery S. Ringenberg, Daniel Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Amir H. Hashemi, David R. Kaeli, and Brad Calder. 1997. Efficient procedure mapping using cache line coloring. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'97). ACM Press, New York, 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark D. Hill and Alan Jay Smith. 1989. Evaluating associativity in cpu caches. IEEE Trans. Comput. 38, 12, 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hideya Kawaji, Yosuke Yamaguchi, Hideo Matsuda, and Akihiro Hashimoto. 2001. A graph-based clustering method for a large set of sequences using a graph partitioning algorithm. Genome Inf. 12, 93--102.Google ScholarGoogle Scholar
  12. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'04). IEEE Computer Society, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yun Liang and Tulika Mitra. 2010. Instruction cache locking using temporal reuse profile. In Proceedings of the 47th Design Automation Conference (DAC'10). ACM Press, New York, 344--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tiantian Liu, Minming Li, and Chun Jason Xue. 2009. Minimizing wcet for real-time embedded systems via static instruction cache locking. In Proceedings of the 15th Real-Time and Embedded Technology and Applications Symposium (RTAS'09). IEEE. 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tiantian Liu, Minming Li, and Chun Jason Xue. 2012. Instruction cache locking for embedded systems using probability profile. J. Signal Process. Syst. Signal Image Video Technol. 69, 16, 173--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Scott McFarling. 1989. Program optimization for instruction caches. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'89). ACM Press, New York, 183--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. MIPS. 2001. MIPS32 architecture for programmers volume ii: The mips32 instruction set. http://www.mips. com/.Google ScholarGoogle Scholar
  18. MPC. 2006. MPC8XX performance-driven optimization of caches and mmu configuration. http://www. freescale.com/files/32bit/doc/appnote/AN3066.pdf.Google ScholarGoogle Scholar
  19. Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. 2004. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'04). IEEE Computer Society, 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Karl Pettis, Robert C. Hansen, and Jack W. Davidson. 2004. Profile guided code positioning. SIGPLAN Not. 39, 4, 398--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Youfeng Wu and James R. Larus. 1994. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO'94). ACM Press, New York, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. XSCALE. 2007. 3rd generation intel xscale microarchirtecture. http://download.intel.com/design/intel xscale/31628302.pdf.Google ScholarGoogle Scholar
  23. Chuanjun Zhang, Frank Vahid, and Walid Najjar. 2003. A highly configurable cache architecture for embedded systems. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). ACM Press, New York, 136--146. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!