skip to main content
research-article

A generalized theory of collaborative caching

Published:15 June 2012Publication History
Skip Abstract Section

Abstract

Collaborative caching allows software to use hints to influence cache management in hardware. Previous theories have shown that such hints observe the inclusion property and can obtain optimal caching if the access sequence and the cache size are known ahead of time. Previously, the interface of a cache hint is limited, e.g., a binary choice between LRU and MRU.

In this paper, we generalize the hint interface, where a hint is a number encoding a priority. We show the generality in a hierarchical relation where collaborative caching subsumes non-collaborative caching, and within collaborative caching, the priority hint subsumes the previous binary hint. We show two theoretical results for the general hint. The first is a new cache replacement policy, priority LRU, which permits the complete range of choices between MRU and LRU. We prove a new type of inclusion property---non-uniform inclusion---and give a one-pass algorithm to compute the miss rate for all cache sizes. Second, we show that priority hints can enable the use of the same hints to obtain optimal caching for all cache sizes, without having to know the cache size beforehand.

References

  1. IA-64 Application Developer';s Architecture Guide. May 1999.Google ScholarGoogle Scholar
  2. L. A. Belady, R. A. Nelson, and G. S. Shedler. An anomaly in space-time characteristics of certain programs running in a paging machine. Communications of ACM, 12(6):349--353, 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distances. In Proceedings of the International Conference on Supercomputing, pages 150--159, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Chauhan and C.-Y. Shei. Static reuse distances for locality-based optimizations in MATLAB. In Proceedings of the International Conference on Supercomputing, pages 295--304, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108--134, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. X. Ding, K. Wang, and X. Zhang. ULCC: a user-level facility for optimizing shared cache performance on multicores. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 103--112, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Fang, S. Carr, S. Önder, and Z. Wang. Instruction based memory distance analysis and its application. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 27--37, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Feng, C. Tian, C. Lin, and R. Gupta. Dynamic access distance driven cache replacement. ACM Transactions on Architecture and Code Optimization, 8(3):14, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Gu, T. Bai, Y. Gao, C. Zhang, R. Archambault, and C. Ding. P-OPT: Program-directed optimal cache management. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 217--231, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. Gu and C. Ding. On the theory and potential of LRU-MRU collaborative cache management. In Proceedings of the International Symposium on Memory Management, pages 43--54, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. D. Hill. Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley, Nov. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Jiang, F. Chen, and X. Zhang. CLOCK-Pro: An effective improvement of the clock replacement. In USENIX Annual Technical Conference, General Track, pages 323--336, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, Marina Del Rey, California, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, Aug. 1993. (also available as CRPC-TR94370).Google ScholarGoogle Scholar
  17. S. M. Khan, D. A. Jiménez, D. Burger, and B. Falsafi. Using dead blocks as a virtual victim cache. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 489--500, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA, pages 144--154, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland, Oregon, Jan. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. S. Emer. Adaptive insertion policies for high performance caching. In Proceedings of the International Symposium on Computer Architecture, pages 381--391, San Diego, California, USA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Rus, R. Ashok, and D. X. Li. Automated locality optimization based on the reuse distance of string operations. In Proceedings of the International Symposium on Code Generation and Optimization, pages 181--190, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 53--64, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49:505--521, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Smaragdakis, S. Kaplan, and P. Wilson. The EELRU adaptive replacement algorithm. Perform. Eval., 53(2):93--123, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. So and R. N. Rechtschaffen. Cache operations by MRU change. IEEE Transactions on Computers, 37(6):700--709, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, Santa Clara, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Z. Wang, K. S. McKinley, A. L.Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, Charlottesville, Virginia, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Xiang, T. Chen, Q. Shi, and W. Hu. Less reused filter: improving L2 cache performance via filtering less reused lines. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 68--79, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. CRAMM: Virtual memory support for garbage-collected applications. In Proceedings of the Symposium on Operating Systems Design and Implementation, pages 103--116, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: the impact of zeroing. In OOPSLA, pages 307--324, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Zahran and S. A. McKee. Global management of cache hierarchies. In Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pages 131--140, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhang and M. Hirzel. Online phase-adaptive data layout selection. In Proceedings of the European Conference on Object-Oriented Programming, pages 309--334, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of the International Symposium on Memory Management, pages 91--100, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6):1--39, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 177--188, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A generalized theory of collaborative caching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!