skip to main content
research-article

Pacman: program-assisted cache management

Published:20 June 2013Publication History
Skip Abstract Section

Abstract

As caches become larger and shared by an increasing number of cores, cache management is becoming more important. This paper explores collaborative caching, which uses software hints to influence hardware caching. Recent studies have shown that such collaboration between software and hardware can theoretically achieve optimal cache replacement on LRU-like cache.

This paper presents Pacman, a practical solution for collaborative caching in loop-based code. Pacman uses profiling to analyze patterns in an optimal caching policy in order to determine which data to cache and at what time. It then splits each loop into different parts at compile time. At run time, the loop boundary is adjusted to selectively store data that would be stored in an optimal policy. In this way, Pacman emulates the optimal policy wherever it can. Pacman requires a single bit at the load and store instructions. Some of the current hardware has partial support. This paper presents results using both simulated and real systems, and compares simulated results to related caching policies.

References

  1. SciMark2.0. http://math.nist.gov/scimark2/.Google ScholarGoogle Scholar
  2. SPEC CPU2000. http://www.spec.org/cpu2000.Google ScholarGoogle Scholar
  3. SPEC CPU2006. http://www.spec.org/cpu2006.Google ScholarGoogle Scholar
  4. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78--101, 1966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Fang, S. Carr, S. Önder, and Z. Wang. Instruction based memory distance analysis and its application. In Proceedings of PACT, pages 27--37, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Feng, C. Tian, C. Lin, and R. Gupta. Dynamic access distance driven cache replacement. ACM Trans. on Arch. and Code Opt., 8(3):14, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Gu, T. Bai, Y. Gao, C. Zhang, R. Archambault, and C. Ding. P-OPT: Program-directed optimal cache management. In Proceedings of the LCPC Workshop, pages 217--231, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Gu and C. Ding. On the theory and potential of LRU-MRU collaborative cache management. In Proceedings of ISMM, pages 43--54, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. Gu and C. Ding. A generalized theory of collaborative caching. In Proceedings of ISMM, pages 109--120, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jha and D. Yee. Increasing memory throughput with intel streaming simd extensions 4 (intel sse4) streaming load, 2007. Intel Developer Zone.Google ScholarGoogle Scholar
  14. S. Jiang and X. Zhang. Making lru friendly to weak locality workloads: A novel replacement algorithm to improve buffer cache performance. IEEE Trans. Computers, 54(8):939--952, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Lattner and V. S. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. In Proceedings of PLDI, pages 129--142, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: Improving hardware cache performance through software-controlled object-level partitioning. In Proceedings of PACT, pages 246--257, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Mao and X. Shen. Cross-input learning and discriminative prediction in evolvable virtual machines. In Proceedings of CGO, pages 92--101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of SIGMETRICS, pages 2--13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. S. Emer. Adaptive insertion policies for high performance caching. In Proceedings of ISCA, pages 381--391, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rus, R. Ashok, and D. X. Li. Automated locality optimization based on the reuse distance of string operations. In Proceedings of CGO, pages 181--190, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. B. Sartor, S. M. Blackburn, D. Frampton, M. Hirzel, and K. S. McKinley. Z-rays: divide arrays and conquer speed and flexibility. In Proceedings of PLDI, pages 471--482, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49:505--521, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Smaragdakis, S. Kaplan, and P. Wilson. The EELRU adaptive replacement algorithm. Perform. Eval., 53(2):93--123, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of SIGMETRICS, Santa Clara, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Wang, K. S. McKinley, A. L.Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In Proceedings of PACT, Charlottesville, Virginia, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C.-J. Wu and M. Martonosi. Characterization and dynamic mitigation of intra-application cache interference. In Proceedings of ISPASS, pages 2--11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Xiang, T. Chen, Q. Shi, and W. Hu. Less reused filter: improving L2 cache performance via filtering less reused lines. In Proceedings of ICS, pages 68--79, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. CRAMM: Virtual memory support for garbage-collected applications. In Proceedings of OSDI, pages 103--116, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: the impact of zeroing. In OOPSLA, pages 307--324, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of ASPLOS, pages 177--188, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pacman: program-assisted cache management

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!