Abstract
Cache in multicore machines is often shared, and the cache performance depends on how memory accesses belonging to different programs interleave with one another. The full range of performance possibilities includes all possible interleavings, which are too numerous to be studied by experiments for any mix of non-trivial programs.
This paper presents a theory to characterize the effect of memory access interleaving due to parallel execution of non-data-sharing programs. The theory uses an established metric called the footprint (which can be used to calculate miss ratios in fully-associative LRU caches) to measure cache demand, and considers the full range of interleaving possibilities. The paper proves a lower bound for footprints of interleaved traces, and then formulates an upper bound in terms of the footprints of the constituent traces. It also shows the correctness of footprint composition used in a number of existing techniques, and places precise bounds on its accuracy.
- Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, and Yingwei Luo. 2015. Optimal Cache Partition-Sharing. In International Conference on Parallel Processing. Google Scholar
Digital Library
- Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi, and Santosh Nagarakatte. 2010. A randomized scheduler with probabilistic guarantees of finding bugs. In ASPLOS. 167–178. Google Scholar
Digital Library
- Dhruba Chandra, Fei Guo, Seongbeom Kim, and Yan Solihin. 2005. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In Proceedings of the International Symposium on HighPerformance Computer Architecture. 340–351. Google Scholar
Digital Library
- Peter J. Denning. 1968. The working set model for program behaviour. Commun. ACM 11, 5 (1968), 323–333. Google Scholar
Digital Library
- Peter J. Denning and Stuart C. Schwartz. 1972. Properties of the working set model. Commun. ACM 15, 3 (1972), 191–198. Google Scholar
Digital Library
- Peter J. Denning and Donald R. Slutz. 1978. Generalized working sets for segment reference strings. Commun. ACM 21, 9 (1978), 750–759. Google Scholar
Digital Library
- Chen Ding and Trishul Chilimbi. 2008. All-window profiling of concurrent executions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Poster paper. Google Scholar
Digital Library
- David Eklov, David Black-Schaffer, and Erik Hagersten. 2011. Fast modeling of shared caches in multicore systems. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers. 147–157. Best paper. Google Scholar
Digital Library
- David Eklov and Erik Hagersten. 2010. StatStack: Efficient modeling of LRU caches. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 55–65.Google Scholar
Cross Ref
- Alexandra Fedorova, Margo I. Seltzer, and Michael D. Smith. 2007. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 25–38.Google Scholar
- Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Chen Ding, and Zhenlin Wang. 2016. Kinetic Modeling of Data Eviction in Cache. In Proceedings of USENIX Annual Technical Conference. 351– 364. https://www.usenix.org/conference/atc16/technical-sessions/ presentation/hu Google Scholar
Digital Library
- Yunlian Jiang, Kai Tian, and Xipeng Shen. 2010. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers. 201–215. Google Scholar
Digital Library
- Hao Li, Jialiang Chang, Zijiang Yang, and Steve Carr. 2017. Memory Distance Measurement for Concurrent Programs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing.Google Scholar
- Pengcheng Li, Chen Ding, and Hao Luo. 2014. Modeling Heap Data Growth Using Average Liveness. In Proceedings of the International Symposium on Memory Management. Google Scholar
Digital Library
- Pengcheng Li, Hao Luo, and Chen Ding. 2016. Rethinking a heap hierarchy as a cache hierarchy: a higher-order theory of memory demand (HOTM). In Proceedings of the International Symposium on Memory Management. 111–121. Google Scholar
Digital Library
- Andreas Sandberg, Andreas Sembrant, Erik Hagersten, and David Black-Schaffer. 2013. Modeling performance variation due to cache sharing. In Proceedings of the International Symposium on HighPerformance Computer Architecture. 155–166. Google Scholar
Digital Library
- Xipeng Shen, Jonathan Shaw, Brian Meeker, and Chen Ding. 2007. Locality approximation using time. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 55–61. Google Scholar
Digital Library
- G. Edward Suh, Srinivas Devadas, and Larry Rudolph. 2001. Analytical cache models with applications to cache partitioning.. In Proceedings of the International Conference on Supercomputing. 1–12. Google Scholar
Digital Library
- Richard West, Puneet Zaroo, Carl A. Waldspurger, and Xiao Zhang. 2010. Online cache modeling for commodity multicore processors. Operating Systems Review 44, 4 (2010), 19–29. Google Scholar
Digital Library
- Xiaoya Xiang, Bin Bao, Chen Ding, and Yaoqing Gao. 2011. Linear-time Modeling of Program Working Set in Shared Cache. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 350–360. Google Scholar
Digital Library
- Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: a higher order theory of locality. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 343–356. Google Scholar
Digital Library
- Chencheng Ye, Chen Ding, Hao Luo, Jacob Brock, Dong Chen, and Hai Jin. 2017. Cache Exclusivity and Sharing: Theory and Optimization. ACM Transactions on Architecture and Code Optimization 14, 4, 34:1– 34:26. Google Scholar
Digital Library
- Liang Yuan, Chen Ding, Peter J. Denning, and Yunquan Zhang. 2018. A Measurement Theory of Locality(MTL). arXiv preprint arXiv:1802.01254 (2018).Google Scholar
- Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the EuroSys Conference. 89–102. Google Scholar
Digital Library
- Xiao Zhang, Rongrong Zhong, Sandhya Dwarkadas, and Kai Shen. 2012. A Flexible Framework for Throttling-Enabled Multicore Management (TEMM). In International Conference on Parallel Processing. 389–398. Google Scholar
Digital Library
Index Terms
Prediction and bounds on shared cache demand from memory access interleaving
Recommendations
Prediction and bounds on shared cache demand from memory access interleaving
ISMM 2018: Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory ManagementCache in multicore machines is often shared, and the cache performance depends on how memory accesses belonging to different programs interleave with one another. The full range of performance possibilities includes all possible interleavings, which are ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecturePractical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...







Comments