skip to main content
article
Public Access

Prediction and bounds on shared cache demand from memory access interleaving

Published:18 June 2018Publication History
Skip Abstract Section

Abstract

Cache in multicore machines is often shared, and the cache performance depends on how memory accesses belonging to different programs interleave with one another. The full range of performance possibilities includes all possible interleavings, which are too numerous to be studied by experiments for any mix of non-trivial programs.

This paper presents a theory to characterize the effect of memory access interleaving due to parallel execution of non-data-sharing programs. The theory uses an established metric called the footprint (which can be used to calculate miss ratios in fully-associative LRU caches) to measure cache demand, and considers the full range of interleaving possibilities. The paper proves a lower bound for footprints of interleaved traces, and then formulates an upper bound in terms of the footprints of the constituent traces. It also shows the correctness of footprint composition used in a number of existing techniques, and places precise bounds on its accuracy.

References

  1. Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, and Yingwei Luo. 2015. Optimal Cache Partition-Sharing. In International Conference on Parallel Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi, and Santosh Nagarakatte. 2010. A randomized scheduler with probabilistic guarantees of finding bugs. In ASPLOS. 167–178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dhruba Chandra, Fei Guo, Seongbeom Kim, and Yan Solihin. 2005. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In Proceedings of the International Symposium on HighPerformance Computer Architecture. 340–351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter J. Denning. 1968. The working set model for program behaviour. Commun. ACM 11, 5 (1968), 323–333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter J. Denning and Stuart C. Schwartz. 1972. Properties of the working set model. Commun. ACM 15, 3 (1972), 191–198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter J. Denning and Donald R. Slutz. 1978. Generalized working sets for segment reference strings. Commun. ACM 21, 9 (1978), 750–759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen Ding and Trishul Chilimbi. 2008. All-window profiling of concurrent executions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Poster paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. David Eklov, David Black-Schaffer, and Erik Hagersten. 2011. Fast modeling of shared caches in multicore systems. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers. 147–157. Best paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David Eklov and Erik Hagersten. 2010. StatStack: Efficient modeling of LRU caches. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 55–65.Google ScholarGoogle ScholarCross RefCross Ref
  10. Alexandra Fedorova, Margo I. Seltzer, and Michael D. Smith. 2007. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 25–38.Google ScholarGoogle Scholar
  11. Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Chen Ding, and Zhenlin Wang. 2016. Kinetic Modeling of Data Eviction in Cache. In Proceedings of USENIX Annual Technical Conference. 351– 364. https://www.usenix.org/conference/atc16/technical-sessions/ presentation/hu Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yunlian Jiang, Kai Tian, and Xipeng Shen. 2010. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers. 201–215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hao Li, Jialiang Chang, Zijiang Yang, and Steve Carr. 2017. Memory Distance Measurement for Concurrent Programs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing.Google ScholarGoogle Scholar
  14. Pengcheng Li, Chen Ding, and Hao Luo. 2014. Modeling Heap Data Growth Using Average Liveness. In Proceedings of the International Symposium on Memory Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pengcheng Li, Hao Luo, and Chen Ding. 2016. Rethinking a heap hierarchy as a cache hierarchy: a higher-order theory of memory demand (HOTM). In Proceedings of the International Symposium on Memory Management. 111–121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andreas Sandberg, Andreas Sembrant, Erik Hagersten, and David Black-Schaffer. 2013. Modeling performance variation due to cache sharing. In Proceedings of the International Symposium on HighPerformance Computer Architecture. 155–166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xipeng Shen, Jonathan Shaw, Brian Meeker, and Chen Ding. 2007. Locality approximation using time. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 55–61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Edward Suh, Srinivas Devadas, and Larry Rudolph. 2001. Analytical cache models with applications to cache partitioning.. In Proceedings of the International Conference on Supercomputing. 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Richard West, Puneet Zaroo, Carl A. Waldspurger, and Xiao Zhang. 2010. Online cache modeling for commodity multicore processors. Operating Systems Review 44, 4 (2010), 19–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xiaoya Xiang, Bin Bao, Chen Ding, and Yaoqing Gao. 2011. Linear-time Modeling of Program Working Set in Shared Cache. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 350–360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: a higher order theory of locality. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 343–356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chencheng Ye, Chen Ding, Hao Luo, Jacob Brock, Dong Chen, and Hai Jin. 2017. Cache Exclusivity and Sharing: Theory and Optimization. ACM Transactions on Architecture and Code Optimization 14, 4, 34:1– 34:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liang Yuan, Chen Ding, Peter J. Denning, and Yunquan Zhang. 2018. A Measurement Theory of Locality(MTL). arXiv preprint arXiv:1802.01254 (2018).Google ScholarGoogle Scholar
  24. Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the EuroSys Conference. 89–102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xiao Zhang, Rongrong Zhong, Sandhya Dwarkadas, and Kai Shen. 2012. A Flexible Framework for Throttling-Enabled Multicore Management (TEMM). In International Conference on Parallel Processing. 389–398. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Prediction and bounds on shared cache demand from memory access interleaving

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!