Abstract
The goal of cache management is to maximize data reuse. Collaborative caching provides an interface for software to communicate access information to hardware. In theory, it can obtain optimal cache performance.
In this paper, we study a collaborative caching system that allows a program to choose different caching methods for its data. As an interface, it may be used in arbitrary ways, sometimes optimal but probably suboptimal most times and even counter productive. We develop a theoretical foundation for collaborative caches to show the inclusion principle and the existence of a distance metric we call LRU-MRU stack distance. The new stack distance is important for program analysis and transformation to target a hierarchical collaborative cache system rather than a single cache configuration. We use 10 benchmark programs to show that optimal caching may reduce the average miss ratio by 24%, and a simple feedback-driven compilation technique can utilize collaborative cache to realize 50% of the optimal improvement.
- The LLVM Compiler Infrastructure. http://llvm.org/.Google Scholar
- SciMark2.0. http://math.nist.gov/scimark2/.Google Scholar
- SPEC CPU2000. http://www.spec.org/cpu2000.Google Scholar
- SPEC CPU2006. http://www.spec.org/cpu2006.Google Scholar
- IA-64 Application Developer's Architecture Guide. May 1999.Google Scholar
- L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 1966. Google Scholar
Digital Library
- L. A. Belady, R. A. Nelson, and G. S. Shedler. An anomaly in space-time characteristics of certain programs running in a paging machine. Communications of ACM, 1969. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, 2002. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 2005. Google Scholar
Digital Library
- C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distances. In International Conference on Supercomputing, 2003. Google Scholar
Digital Library
- A. Chauhan and C.-Y. Shei. Static reuse distances for locality-based optimizations in MATLAB. In International Conference on Supercomputing, 2010. Google Scholar
Digital Library
- F. Chen, S. Jiang, and X. Zhang. CLOCK-Pro: an effective improvement of the CLOCK replacement. In Proceedings of USENIX Annual Technical Conference, 2005. Google Scholar
Digital Library
- R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 1991. Google Scholar
Digital Library
- C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 2004. Google Scholar
Digital Library
- X. Ding, K. Wang, and X. Zhang. ULCC: a user-level facility for optimizing shared cache performance on multicores. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011. Google Scholar
Digital Library
- C. Fang, S. Carr, S. Önder, and Z. Wang. Instruction based memory distance analysis and its application. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2005. Google Scholar
Digital Library
- X. Gu, T. Bai, Y. Gao, C. Zhang, R. Archambault, and C. Ding. P-OPT: Program-directed optimal cache management. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, 2008. Google Scholar
Digital Library
- M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 1989. Google Scholar
Digital Library
- S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, 2002. Google Scholar
Digital Library
- K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, 1993.Google Scholar
- S. M. Khan, D. A. Jiménez, D. Burger, and B. Falsafi. Using dead blocks as a virtual victim cache. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, 2010. Google Scholar
Digital Library
- A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In Proceedings of the International Symposium on Computer Architecture, 2001. Google Scholar
Digital Library
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 1970. Google Scholar
Digital Library
- K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 1996. Google Scholar
Digital Library
- E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2002. Google Scholar
Digital Library
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. S. Emer. Adaptive insertion policies for high performance caching. In Proceedings of the International Symposium on Computer Architecture, 2007. Google Scholar
Digital Library
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 2005. Google Scholar
Digital Library
- Y. Smaragdakis, S. Kaplan, and P. Wilson. The EELRU adaptive replacement algorithm. Perform. Eval., 2003. Google Scholar
Digital Library
- K. So and R. N. Rechtschaffen. Cache operations by MRU change. IEEE Transactions on Computers, 1988. Google Scholar
Digital Library
- R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, 1993. Google Scholar
Digital Library
- Z. Wang, K. S. McKinley, A. L.Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2002. Google Scholar
Digital Library
- M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, 1991. Google Scholar
Digital Library
- L. Xiang, T. Chen, Q. Shi, and W. Hu. Less reused filter: improving L2 cache performance via filtering less reused lines. In Proceedings of the 23rd international conference on Supercomputing, 2009. Google Scholar
Digital Library
- T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. Cramm: Virtual memory support for garbage-collected applications. In Proceedings of the Symposium on Operating Systems Design and Implementation, 2006. Google Scholar
Digital Library
- M. Zahran and S. A. McKee. Global management of cache hierarchies. In Proceedings of the 7th ACM international conference on Computing frontiers, 2010. Google Scholar
Digital Library
- C. Zhang and M. Hirzel. Online phase-adaptive data layout selection. In Proceedings of the European Conference on Object-Oriented Programming, 2008. Google Scholar
Digital Library
- Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 2009. Google Scholar
Digital Library
- P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2004. Google Scholar
Digital Library
Index Terms
On the theory and potential of LRU-MRU collaborative cache management
Recommendations
On the theory and potential of LRU-MRU collaborative cache management
ISMM '11: Proceedings of the international symposium on Memory managementThe goal of cache management is to maximize data reuse. Collaborative caching provides an interface for software to communicate access information to hardware. In theory, it can obtain optimal cache performance.
In this paper, we study a collaborative ...
Minor memory references matter in collaborative caching
MSPC '11: Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and CorrectnessCollaborative caching uses different caching methods, e. g., LRU and MRU, for data with good or poor locality. Poorlocality data are evicted by MRU quickly, leaving most cache space to hold good-locality data by LRU. In our previous study, we selected ...
Modeling LRU cache with invalidation
Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...







Comments