Abstract
As caches become larger and shared by an increasing number of cores, cache management is becoming more important. This paper explores collaborative caching, which uses software hints to influence hardware caching. Recent studies have shown that such collaboration between software and hardware can theoretically achieve optimal cache replacement on LRU-like cache.
This paper presents Pacman, a practical solution for collaborative caching in loop-based code. Pacman uses profiling to analyze patterns in an optimal caching policy in order to determine which data to cache and at what time. It then splits each loop into different parts at compile time. At run time, the loop boundary is adjusted to selectively store data that would be stored in an optimal policy. In this way, Pacman emulates the optimal policy wherever it can. Pacman requires a single bit at the load and store instructions. Some of the current hardware has partial support. This paper presents results using both simulated and real systems, and compares simulated results to related caching policies.
- SciMark2.0. http://math.nist.gov/scimark2/.Google Scholar
- SPEC CPU2000. http://www.spec.org/cpu2000.Google Scholar
- SPEC CPU2006. http://www.spec.org/cpu2006.Google Scholar
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001. Google Scholar
Digital Library
- L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78--101, 1966. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, Aug. 2002. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google Scholar
Digital Library
- C. Fang, S. Carr, S. Önder, and Z. Wang. Instruction based memory distance analysis and its application. In Proceedings of PACT, pages 27--37, 2005. Google Scholar
Digital Library
- M. Feng, C. Tian, C. Lin, and R. Gupta. Dynamic access distance driven cache replacement. ACM Trans. on Arch. and Code Opt., 8(3):14, 2011. Google Scholar
Digital Library
- X. Gu, T. Bai, Y. Gao, C. Zhang, R. Archambault, and C. Ding. P-OPT: Program-directed optimal cache management. In Proceedings of the LCPC Workshop, pages 217--231, 2008. Google Scholar
Digital Library
- X. Gu and C. Ding. On the theory and potential of LRU-MRU collaborative cache management. In Proceedings of ISMM, pages 43--54, 2011. Google Scholar
Digital Library
- X. Gu and C. Ding. A generalized theory of collaborative caching. In Proceedings of ISMM, pages 109--120, 2012. Google Scholar
Digital Library
- A. Jha and D. Yee. Increasing memory throughput with intel streaming simd extensions 4 (intel sse4) streaming load, 2007. Intel Developer Zone.Google Scholar
- S. Jiang and X. Zhang. Making lru friendly to weak locality workloads: A novel replacement algorithm to improve buffer cache performance. IEEE Trans. Computers, 54(8):939--952, 2005. Google Scholar
Digital Library
- C. Lattner and V. S. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. In Proceedings of PLDI, pages 129--142, 2005. Google Scholar
Digital Library
- Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: Improving hardware cache performance through software-controlled object-level partitioning. In Proceedings of PACT, pages 246--257, 2009. Google Scholar
Digital Library
- F. Mao and X. Shen. Cross-input learning and discriminative prediction in evolvable virtual machines. In Proceedings of CGO, pages 92--101, 2009. Google Scholar
Digital Library
- G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of SIGMETRICS, pages 2--13, 2004. Google Scholar
Digital Library
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970. Google Scholar
Digital Library
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. S. Emer. Adaptive insertion policies for high performance caching. In Proceedings of ISCA, pages 381--391, 2007. Google Scholar
Digital Library
- S. Rus, R. Ashok, and D. X. Li. Automated locality optimization based on the reuse distance of string operations. In Proceedings of CGO, pages 181--190, 2011. Google Scholar
Digital Library
- J. B. Sartor, S. M. Blackburn, D. Frampton, M. Hirzel, and K. S. McKinley. Z-rays: divide arrays and conquer speed and flexibility. In Proceedings of PLDI, pages 471--482, 2010. Google Scholar
Digital Library
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49:505--521, July 2005. Google Scholar
Digital Library
- Y. Smaragdakis, S. Kaplan, and P. Wilson. The EELRU adaptive replacement algorithm. Perform. Eval., 53(2):93--123, 2003. Google Scholar
Digital Library
- R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of SIGMETRICS, Santa Clara, CA, May 1993. Google Scholar
Digital Library
- Z. Wang, K. S. McKinley, A. L.Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In Proceedings of PACT, Charlottesville, Virginia, 2002. Google Scholar
Digital Library
- C.-J. Wu and M. Martonosi. Characterization and dynamic mitigation of intra-application cache interference. In Proceedings of ISPASS, pages 2--11, 2011. Google Scholar
Digital Library
- L. Xiang, T. Chen, Q. Shi, and W. Hu. Less reused filter: improving L2 cache performance via filtering less reused lines. In Proceedings of ICS, pages 68--79, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. CRAMM: Virtual memory support for garbage-collected applications. In Proceedings of OSDI, pages 103--116, 2006. Google Scholar
Digital Library
- X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: the impact of zeroing. In OOPSLA, pages 307--324, 2011. Google Scholar
Digital Library
- P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of ASPLOS, pages 177--188, 2004. Google Scholar
Digital Library
Index Terms
Pacman: program-assisted cache management
Recommendations
Pacman: program-assisted cache management
ISMM '13: Proceedings of the 2013 international symposium on memory managementAs caches become larger and shared by an increasing number of cores, cache management is becoming more important. This paper explores collaborative caching, which uses software hints to influence hardware caching. Recent studies have shown that such ...
Pacman: program-assisted cache management
ISMM '13: Proceedings of the 2013 international symposium on memory managementAs caches become larger and shared by an increasing number of cores, cache management is becoming more important. This paper explores collaborative caching, which uses software hints to influence hardware caching. Recent studies have shown that such ...
A generalized theory of collaborative caching
ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementCollaborative caching allows software to use hints to influence cache management in hardware. Previous theories have shown that such hints observe the inclusion property and can obtain optimal caching if the access sequence and the cache size are known ...







Comments