skip to main content
research-article

HOTL: a higher order theory of locality

Authors Info & Claims
Published:16 March 2013Publication History
Skip Abstract Section

Abstract

The locality metrics are many, for example, miss ratio to test performance, data footprint to manage cache sharing, and reuse distance to analyze and optimize a program. It is unclear how different metrics are related, whether one subsumes another, and what combination may represent locality completely.

This paper first derives a set of formulas to convert between five locality metrics and gives the condition for correctness. The transformation is analogous to differentiation and integration used to convert between higher order polynomials. As a result, these metrics can be assigned an order and organized into a hierarchy.

Using the new theory, the paper then develops two techniques: one measures the locality in real time without special hardware support, and the other predicts multicore cache interference without parallel testing. The paper evaluates them using sequential and parallel programs as well as for a parallel mix of sequential programs.

References

  1. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of PLDI, pages 168--179, Snowbird, Utah, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In Proceedings of SIGMETRICS, pages 169--180, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Beyls and E. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, pages 220--229, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of PACT, pages 72--81, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of PACT, pages 339--349, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distances. In Proceedings of ICS, pages 150--159, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of HPCA, pages 340--351, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Chauhan and C.-Y. Shei. Static reuse distances for locality-based optimizations in MATLAB. In Proceedings of ICS, pages 295--304, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of PLDI, Berlin, Germany, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Cui, Q. Yi, J. Xue, L. Wang, Y. Yang, and X. Feng. A highly parallel reuse distance analysis algorithm on gpus. In Proceedings of IPDPS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. J. Denning. The working set model for program behaviour. Communications of ACM, 11(5):323--333, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), Jan. 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. J. Denning and S. C. Schwartz. Properties of the working set model. Communications of ACM, 15(3):191--198, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. J. Denning and D. R. Slutz. Generalized working sets for segment reference strings. Communications of ACM, 21(9):750--759, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Ding and T. Chilimbi. All-window profiling of concurrent executions. In Proceedings of PPoPP, 2008. phposter paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Ding and T. Chilimbi. A composable model for analyzing locality of multi-threaded programs. Technical Report MSR-TR-2009--107, Microsoft Research, August 2009.Google ScholarGoogle Scholar
  19. D. Eklov, D. Black-Schaffer, and E. Hagersten. Fast modeling of shared caches in multicore systems. In Proceedings of HiPEAC, pages 147--157, 2011. phbest paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Eklov and E. Hagersten. StatStack: Efficient modeling of LRU caches. In Proceedings of ISPASS, pages 55--65, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Fang, S. Carr, S. Önder, and Z. Wang. Path-based reuse distance analysis. In Proceedings of CC, pages 32--46, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Gupta, P. Xiang, Y. Yang, and H. Zhou. Locality principle revisited: A probability-based quantitative approach. In Proceedings of IPDPS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Computer Architecture News, 34(4):1--17, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. D. Hill. Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley, Nov. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612--1630, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In Proceedings of HiPEAC, pages 201--215, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceedings of CC, pages 264--282, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. F. Kaplan, Y. Smaragdakis, and P. R. Wilson. Flexible reference trace reduction for VM simulations. ACM Transactions on Modeling and Computer Simulation, 13(1):1--38, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C.-K. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of PLDI, pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of SIGMETRICS, pages 2--13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow profiling: Hiding instrumentation costs with parallelism. In Proceedings of CGO, pages 198--208, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q. Niu, J. Dinan, Q. Lu, and P. Sadayappan. PARDA: A fast parallel reuse distance analysis algorithm. In Proceedings of IPDPS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Olken. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370, Lawrence Berkeley Laboratory, 1981.Google ScholarGoogle Scholar
  35. D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of PACT, pages 53--64, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of POPL, pages 55--61, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. J. Smith. On the effectiveness of set associative page mapping and its applications in main memory management. In Proceedings of ICSE, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of SIGMETRICS, Santa Clara, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In Proceedings of ICS, pages 1--12, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. K. Tam, R. Azimi, L. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In Proceedings of ASPLOS, pages 121--132, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Thiébaut and H. S. Stone. Footprints in the cache. ACM Transactions on Computer Systems, 5(4):305--329, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Wallace and K. Hazelwood. Superpin: Parallelizing dynamic instrumentation for real-time performance. In Proceedings of CGO, pages 209--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M.-J. Wu and D. Yeung. Coherent profiles: Enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs. In Proceedings of PACT, pages 264--275, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M.-J. Wu and D. Yeung. Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis. In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance and Correctness, pages 2--11, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. X. Xiang, B. Bao, T. Bai, C. Ding, and T. M. Chilimbi. All-window profiling and composable models of cache sharing. In Proceedings of PPoPP, pages 91--102, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. X. Xiang, B. Bao, C. Ding, and Y. Gao. Linear-time modeling of program working set in shared cache. In Proceedings of PACT, pages 350--360, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. X. Xiang, B. Bao, C. Ding, and K. Shen. Cache conscious task regrouping on multicore processors. In Proceedings of CCGrid, pages 603--611, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of ISMM, pages 91--100, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6):1--39, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. X. Zhuang, M. J. Serrano, H. W. Cain, and J.-D. Choi. Accurate, efficient, and adaptive calling context profiling. In Proceedings of PLDI, pages 263--271, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of ASPLOS, pages 129--142, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HOTL: a higher order theory of locality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
        March 2013
        574 pages
        ISBN:9781450318709
        DOI:10.1145/2451116

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!