Abstract
The locality metrics are many, for example, miss ratio to test performance, data footprint to manage cache sharing, and reuse distance to analyze and optimize a program. It is unclear how different metrics are related, whether one subsumes another, and what combination may represent locality completely.
This paper first derives a set of formulas to convert between five locality metrics and gives the condition for correctness. The transformation is analogous to differentiation and integration used to convert between higher order polynomials. As a result, these metrics can be assigned an order and organized into a hierarchy.
Using the new theory, the paper then develops two techniques: one measures the locality in real time without special hardware support, and the other predicts multicore cache interference without parallel testing. The paper evaluates them using sequential and parallel programs as well as for a parallel mix of sequential programs.
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001. Google Scholar
Digital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of PLDI, pages 168--179, Snowbird, Utah, June 2001. Google Scholar
Digital Library
- E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In Proceedings of SIGMETRICS, pages 169--180, 2005. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, pages 220--229, 2006. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of PACT, pages 72--81, 2008. Google Scholar
Digital Library
- C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of PACT, pages 339--349, 2005. Google Scholar
Digital Library
- C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distances. In Proceedings of ICS, pages 150--159, 2003. Google Scholar
Digital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of HPCA, pages 340--351, 2005. Google Scholar
Digital Library
- A. Chauhan and C.-Y. Shei. Static reuse distances for locality-based optimizations in MATLAB. In Proceedings of ICS, pages 295--304, 2010. Google Scholar
Digital Library
- T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of PLDI, Berlin, Germany, June 2002. Google Scholar
Digital Library
- H. Cui, Q. Yi, J. Xue, L. Wang, Y. Yang, and X. Feng. A highly parallel reuse distance analysis algorithm on gpus. In Proceedings of IPDPS, 2012. Google Scholar
Digital Library
- P. J. Denning. The working set model for program behaviour. Communications of ACM, 11(5):323--333, 1968. Google Scholar
Digital Library
- P. J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), Jan. 1980. Google Scholar
Digital Library
- P. J. Denning and S. C. Schwartz. Properties of the working set model. Communications of ACM, 15(3):191--198, 1972. Google Scholar
Digital Library
- P. J. Denning and D. R. Slutz. Generalized working sets for segment reference strings. Communications of ACM, 21(9):750--759, 1978. Google Scholar
Digital Library
- C. Ding and T. Chilimbi. All-window profiling of concurrent executions. In Proceedings of PPoPP, 2008. phposter paper. Google Scholar
Digital Library
- C. Ding and T. Chilimbi. A composable model for analyzing locality of multi-threaded programs. Technical Report MSR-TR-2009--107, Microsoft Research, August 2009.Google Scholar
- D. Eklov, D. Black-Schaffer, and E. Hagersten. Fast modeling of shared caches in multicore systems. In Proceedings of HiPEAC, pages 147--157, 2011. phbest paper. Google Scholar
Digital Library
- D. Eklov and E. Hagersten. StatStack: Efficient modeling of LRU caches. In Proceedings of ISPASS, pages 55--65, 2010.Google Scholar
Cross Ref
- C. Fang, S. Carr, S. Önder, and Z. Wang. Path-based reuse distance analysis. In Proceedings of CC, pages 32--46, 2006. Google Scholar
Digital Library
- S. Gupta, P. Xiang, Y. Yang, and H. Zhou. Locality principle revisited: A probability-based quantitative approach. In Proceedings of IPDPS, 2012. Google Scholar
Digital Library
- J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Computer Architecture News, 34(4):1--17, 2006. Google Scholar
Digital Library
- M. D. Hill. Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley, Nov. 1987. Google Scholar
Digital Library
- M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612--1630, 1989. Google Scholar
Digital Library
- Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In Proceedings of HiPEAC, pages 201--215, 2010. Google Scholar
Digital Library
- Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceedings of CC, pages 264--282, 2010. Google Scholar
Digital Library
- S. F. Kaplan, Y. Smaragdakis, and P. R. Wilson. Flexible reference trace reduction for VM simulations. ACM Transactions on Modeling and Computer Simulation, 13(1):1--38, 2003. Google Scholar
Digital Library
- C.-K. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of PLDI, pages 190--200, 2005. Google Scholar
Digital Library
- G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of SIGMETRICS, pages 2--13, 2004. Google Scholar
Digital Library
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970. Google Scholar
Digital Library
- T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow profiling: Hiding instrumentation costs with parallelism. In Proceedings of CGO, pages 198--208, 2007. Google Scholar
Digital Library
- Q. Niu, J. Dinan, Q. Lu, and P. Sadayappan. PARDA: A fast parallel reuse distance analysis algorithm. In Proceedings of IPDPS, 2012. Google Scholar
Digital Library
- F. Olken. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370, Lawrence Berkeley Laboratory, 1981.Google Scholar
- D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of PACT, pages 53--64, 2010. Google Scholar
Digital Library
- X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of POPL, pages 55--61, 2007. Google Scholar
Digital Library
- A. J. Smith. On the effectiveness of set associative page mapping and its applications in main memory management. In Proceedings of ICSE, 1976. Google Scholar
Digital Library
- R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of SIGMETRICS, Santa Clara, CA, May 1993. Google Scholar
Digital Library
- G. E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In Proceedings of ICS, pages 1--12, 2001. Google Scholar
Digital Library
- D. K. Tam, R. Azimi, L. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In Proceedings of ASPLOS, pages 121--132, 2009. Google Scholar
Digital Library
- D. Thiébaut and H. S. Stone. Footprints in the cache. ACM Transactions on Computer Systems, 5(4):305--329, 1987. Google Scholar
Digital Library
- S. Wallace and K. Hazelwood. Superpin: Parallelizing dynamic instrumentation for real-time performance. In Proceedings of CGO, pages 209--220, 2007. Google Scholar
Digital Library
- M.-J. Wu and D. Yeung. Coherent profiles: Enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs. In Proceedings of PACT, pages 264--275, 2011. Google Scholar
Digital Library
- M.-J. Wu and D. Yeung. Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis. In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance and Correctness, pages 2--11, 2012. Google Scholar
Digital Library
- X. Xiang, B. Bao, T. Bai, C. Ding, and T. M. Chilimbi. All-window profiling and composable models of cache sharing. In Proceedings of PPoPP, pages 91--102, 2011. Google Scholar
Digital Library
- X. Xiang, B. Bao, C. Ding, and Y. Gao. Linear-time modeling of program working set in shared cache. In Proceedings of PACT, pages 350--360, 2011. Google Scholar
Digital Library
- X. Xiang, B. Bao, C. Ding, and K. Shen. Cache conscious task regrouping on multicore processors. In Proceedings of CCGrid, pages 603--611, 2012. Google Scholar
Digital Library
- Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of ISMM, pages 91--100, 2008. Google Scholar
Digital Library
- Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6):1--39, Aug. 2009. Google Scholar
Digital Library
- X. Zhuang, M. J. Serrano, H. W. Cain, and J.-D. Choi. Accurate, efficient, and adaptive calling context profiling. In Proceedings of PLDI, pages 263--271, 2006. Google Scholar
Digital Library
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of ASPLOS, pages 129--142, 2010. Google Scholar
Digital Library
Index Terms
HOTL: a higher order theory of locality
Recommendations
Uniform lease vs. LRU cache: analysis and evaluation
ISMM 2021: Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory ManagementLease caching is a new technique that provides greater control of the cache than what is allowed in conventional caches. The simplest control is uniform lease (UL), which means that all leases are identical in length. The UL cache is prescriptive and ...
HOTL: a higher order theory of locality
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsThe locality metrics are many, for example, miss ratio to test performance, data footprint to manage cache sharing, and reuse distance to analyze and optimize a program. It is unclear how different metrics are related, whether one subsumes another, and ...
HOTL: a higher order theory of locality
ASPLOS '13The locality metrics are many, for example, miss ratio to test performance, data footprint to manage cache sharing, and reuse distance to analyze and optimize a program. It is unclear how different metrics are related, whether one subsumes another, and ...







Comments