skip to main content
research-article

All-window profiling and composable models of cache sharing

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

As multi-core processors become commonplace and cloud computing is gaining acceptance, more applications are run in a shared cache environment. Cache sharing depends on a concept called footprint, which depends on all cache accesses not just cache misses. Previous work has recognized the importance of footprint but has not provided a method for accurate measurement, mainly because the complete measurement requires counting data access in all execution windows, which takes time quadratic in the length of a trace. The paper first presents an algorithm efficient enough for off-line use to approximately measure the footprint with a guaranteed precision. The cost of the analysis can be adjusted by changing the precision. Then the paper presents a composable model. For a set of programs, the model uses the all-window footprint of each program to predict its cache interference with other programs without running these programs together. The paper evaluates the efficiency of all-window profiling using the SPEC 2000 benchmarks and compares the footprint interference model with a miss-rate based model and with exhaustive testing.

References

  1. A. Agarwal, J. L. Hennessy, and M. Horowitz. Cache performance of operating system and multiprogramming workloads. ACM Transactions on Computer Systems, 6(4):393--431, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. T. Bennett and V. J. Kruskal. LRU stack processing. IBM Journal of Research and Development, pages 353--357, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Berg and E. Hagersten. Statcache: a probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pages 20--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Beyls and E. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, pages 220--229, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, St. Louis, MO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distances. In International Conference on Supercomputing, pages 150--159, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting interthread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High-Performance Computer Architecture, pages 340--351, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Chauhan and C.-Y. Shei. Static reuse distances for locality-based optimizations in MATLAB. In International Conference on Supercomputing, pages 295--304, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Ding and T. Chilimbi. All-window profiling of concurrent executions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008. poster paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Ding and T. Chilimbi. A composable model for analyzing locality of multi-threaded programs. Technical Report MSR-TR-2009-107, Microsoft Research, August 2009.Google ScholarGoogle Scholar
  12. B. Falsafi and D. A. Wood. Modeling cost/performance of a parallel computer simulator. ACM Transactions on Modeling and Computer Simulation, 7(1):104--130, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612--1630, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 220--229, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceedings of the International Conference on Compiler Construction, pages 264--282, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. F. Kaplan, Y. Smaragdakis, and P. R. Wilson. Flexible reference trace reduction for VM simulations. ACM Transactions on Modeling and Computer Simulation, 13(1):1--38, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, pages 2--13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117,1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Schulz, B. S. White, S. A. McKee, H.-H. S. Lee, and J. Jeitner. Owl: next generation system monitoring. In Proceedings of the ACM Conference on Computing Frontiers, pages 116--124, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Shen and J. Shaw. Scalable implementation of efficient locality approximation. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 202--216, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 55--61, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multi-threading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 234--244, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In International Conference on Supercomputing, pages 1--12, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Thiébaut and H. S. Stone. Footprints in the cache. ACM Transactions on Computer Systems, 5(4):305--329, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multi-core cache management. In Proceedings of the EuroSys Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6):1--39, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 129--142, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. All-window profiling and composable models of cache sharing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 46, Issue 8
            PPoPP '11
            August 2011
            300 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2038037
            Issue’s Table of Contents
            • cover image ACM Conferences
              PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
              February 2011
              326 pages
              ISBN:9781450301190
              DOI:10.1145/1941553
              • General Chair:
              • Calin Cascaval,
              • Program Chair:
              • Pen-Chung Yew

            Copyright © 2011 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 February 2011

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!