ABSTRACT
Reuse distance (i.e. LRU stack distance) precisely characterizes program locality and has been a basic tool for memory system research since the 1970s. However, the high cost of measuring has restricted its practical uses in performance debugging, locality analysis and optimizations of long-running applications.In this work, we improve the efficiency by exploring the connection between time and locality. We propose a statistical model that converts cheaply obtained time distance to the more costly reuse distance. Compared to the state-of-the-art technique, this approach reduces measuring time by a factor of 17, and approximates cache line reuses with over 99% accuracy and the cache miss rate with less than 0.4% average error for 12 SPEC 2000 integer and floating-point benchmarks. By exploiting the strong correlations between time and locality, this work makes precise locality as easy to obtain as data access frequency, and opens new opportunities for program optimizations.
- G. Almasi, C. Cascaval, and D. Padua. Calculating stack distances efficiently. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, June 2002. Google Scholar
Digital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google Scholar
Digital Library
- B. T. Bennett and V. J. Kruskal. LRU stack processing. IBM Journal of Research and Development, pages 353--357, 1975.Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, August 2001.Google Scholar
- K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, pages 220--229, 2006. Google Scholar
Digital Library
- C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, 2005. Google Scholar
Digital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005. Google Scholar
Digital Library
- T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, June 2002. Google Scholar
Digital Library
- C. Ding and M. Orlovich. The potential of computation regrouping for improving locality. In Proceedings of SC2004 High Performance Computing, Networking, and Storage Conference, Pittsburgh, PA, November 2004. Google Scholar
Digital Library
- C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003. Google Scholar
Digital Library
- C. Fang, S. Carr, S. Onder, and Z. Wang. Instruction based memory distance analysis and its application to optimization. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, 2005. Google Scholar
Digital Library
- S. A. Huang and J. P. Shen. The intrinsic bandwidth requirements of ordinary programs. In Proceedings of the 7th International Conferences on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, October 1996. Google Scholar
Digital Library
- Y. H. Kim, M. D. Hill, and D. A. Wood. Implementing stack simulation for highly-associative memories. In Proc. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 212--213, May 1991. Google Scholar
Digital Library
- Z. Li, J. Gu, and G. Lee. An evaluation of the potential benefits of register allocation for array references. In Workshop on Interaction between Compilers and Computer Architectures in conjunction with the HPCA-2, San Jose, California, February 1996.Google Scholar
- C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, Illinois, June 2005. Google Scholar
Digital Library
- G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004. Google Scholar
Digital Library
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.Google Scholar
Digital Library
- F. Olken. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370, Lawrence Berkeley Laboratory, 1981.Google Scholar
Cross Ref
- X. Shen, J. Shaw, and B. Meeker. Accurate approximation of locality from time distance histograms. Technical Report TR902, Computer Science Department, University of Rochester, 2006.Google Scholar
- X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. Technical Report TR901, Computer Science Department, University of Rochester, 2006.Google Scholar
- X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google Scholar
Digital Library
- A. J. Smith. On the effectiveness of set associative page mapping and its applications in main memory management. In Proceedings of the 2nd International Conference on Software Engineering, 1976. Google Scholar
Digital Library
- R. A. Sugumar and S. G. Abraham. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Technical report, University of Michigan, 1993.Google Scholar
- Y. Zhong, S. G. Dropsho, X. Shen, A. Studer, and C. Ding. Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers, to appear. Google Scholar
Digital Library
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004. Google Scholar
Digital Library
Index Terms
Locality approximation using time
Recommendations
Locality approximation using time
Proceedings of the 2007 POPL ConferenceReuse distance (i.e. LRU stack distance) precisely characterizes program locality and has been a basic tool for memory system research since the 1970s. However, the high cost of measuring has restricted its practical uses in performance debugging, ...
Program locality analysis using reuse distance
On modern computer systems, the memory performance of an application depends on its locality. For a single execution, locality-correlated measures like average miss rate or working-set size have long been analyzed using reuse distance—the number of ...
A component model of spatial locality
ISMM '09: Proceedings of the 2009 international symposium on Memory managementGood spatial locality alleviates both the latency and bandwidth problem of memory by boosting the effect of prefetching and improving the utilization of cache. However, conventional definitions of spatial locality are inadequate for a programmer to ...






Comments