ABSTRACT
Techniques to evaluate a program's cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is proportional to the number of memory accesses performed by the program under analysis. 2. Relying on implicit workload characterizations such as the polyhedral model, analytical approaches often achieve problem-size-independent runtimes, but so far have been limited to idealized cache models.
We introduce a hybrid approach, warping cache simulation, that aims to achieve applicability to real-world cache models and problem-size-independent runtimes. As prior analytical approaches, we focus on programs in the polyhedral model, which allows to reason about the sequence of memory accesses analytically. Combining this analytical reasoning with information about the cache behavior obtained from explicit cache simulation allows us to soundly fast-forward the simulation. By this process of warping, we accelerate the simulation so that its cost is often independent of the number of memory accesses.
- Andreas Abel and Jan Reineke. 2013. Measurement-based modeling of the cache replacement policy. In 19th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2013, Philadelphia, PA, USA, April 9-11, 2013. IEEE Computer Society, 65–74. https://doi.org/10.1109/RTAS.2013.6531080 Google Scholar
Digital Library
- Andreas Abel and Jan Reineke. 2020. nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems. In IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2020, Boston, MA, USA, August 23-25, 2020. IEEE, 34–46. https://doi.org/10.1109/ISPASS48437.2020.00014 Google Scholar
Cross Ref
- Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic. 2004. Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite. In Proceedings of the 42nd Annual Southeast Regional Conference (ACM-SE 42). ACM, New York, NY, USA. 267–272. isbn:1581138709 https://doi.org/10.1145/986537.986601 Google Scholar
Digital Library
- Martin Alt, Christian Ferdinand, Florian Martin, and Reinhard Wilhelm. 1996. Cache Behavior Prediction by Abstract Interpretation. In Static Analysis, Third International Symposium, SAS’96, Aachen, Germany, September 24-26, 1996, Proceedings. 52–66. https://doi.org/10.1007/3-540-61739-6_33 Google Scholar
Cross Ref
- Aurore Annichini, Ahmed Bouajjani, and Mihaela Sighireanu. 2001. TReX: A Tool for Reachability Analysis of Complex Systems. In Computer Aided Verification, 13th International Conference, CAV 2001, Paris, France, Gérard Berry, Hubert Comon, and Alain Finkel (Eds.). 2102, Springer, 368–372. https://doi.org/10.1007/3-540-44585-4_34 Google Scholar
Cross Ref
- David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-Performance Computing. ACM Comput. Surv., 26, 4 (1994), 345–420. https://doi.org/10.1145/197405.197406 Google Scholar
Digital Library
- Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, and P. Sadayappan. 2018. Analytical modeling of cache behavior for affine programs. Proc. ACM Program. Lang., 2, POPL (2018), 32:1–32:26. https://doi.org/10.1145/3158120 Google Scholar
Digital Library
- Sébastien Bardin, Alain Finkel, Jérôme Leroux, and Laure Petrucci. 2003. FAST: Fast Acceleration of Symbolic Transition Systems. In Computer Aided Verification, 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003, Warren A. Hunt Jr. and Fabio Somenzi (Eds.). 2725, Springer, 118–121. https://doi.org/10.1007/978-3-540-45069-6_12 Google Scholar
Cross Ref
- Mohamed-Walid Benabderrahmane, Louis-Noël Pouchet, Albert Cohen, and Cédric Bastoul. 2010. The Polyhedral Model Is More Widely Applicable Than You Think. In Compiler Construction, 19th International Conference, CC 2010, Rajiv Gupta (Ed.). 6011, Springer, 283–303. https://doi.org/10.1007/978-3-642-11970-5_16 Google Scholar
Digital Library
- Kristof Beyls and Erik H. D’Hollander. 2005. Generating cache hints for improved program efficiency. J. Syst. Archit., 51, 4 (2005), 223–250. https://doi.org/10.1016/j.sysarc.2004.09.004 Google Scholar
Digital Library
- Bernard Boigelot and Pierre Wolper. 1994. Symbolic Verification with Periodic Sets. In Computer Aided Verification, 6th International Conference, CAV ’94, Stanford, California, USA, June 21-23, 1994, Proceedings, David L. Dill (Ed.). 818, Springer, 55–67. https://doi.org/10.1007/3-540-58179-0_43 Google Scholar
Cross Ref
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 101–113. https://doi.org/10.1145/1375581.1375595 Google Scholar
Digital Library
- Calin Cascaval and David A. Padua. 2003. Estimating Cache Misses and Locality Using Stack Distances. In Proceedings of the 17th International Conference on Supercomputing (ICS ’03). ACM, New York, NY, USA. 150–159. isbn:1581137338 https://doi.org/10.1145/782814.782836 Google Scholar
Digital Library
- Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact Analysis of the Cache Behavior of Nested Loops. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Snowbird, Utah, USA, June 20-22, 2001, Michael Burke and Mary Lou Soffa (Eds.). ACM, 286–297. https://doi.org/10.1145/378795.378859 Google Scholar
Digital Library
- Sudipta Chattopadhyay and Abhik Roychoudhury. 2013. Scalable and precise refinement of cache timing analysis via path-sensitive verification. Real Time Syst., 49, 4 (2013), 517–562. https://doi.org/10.1007/s11241-013-9178-0 Google Scholar
Cross Ref
- Dong Chen, Fangzhou Liu, Chen Ding, and Sreepathi Pai. 2018. Locality Analysis through Static Parallel Sampling. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York, NY, USA. 557–570. isbn:9781450356985 https://doi.org/10.1145/3192366.3192402 Google Scholar
Digital Library
- Hubert Comon and Yan Jurski. 1998. Multiple Counters Automata, Safety Analysis and Presburger Arithmetic. In Computer Aided Verification, 10th International Conference, CAV ’98, Vancouver, BC, Canada, June 28 - July 2, 1998, Proceedings, Alan J. Hu and Moshe Y. Vardi (Eds.). 1427, Springer, 268–279. https://doi.org/10.1007/BFb0028751 Google Scholar
Cross Ref
- Goran Doychev, Boris Köpf, Laurent Mauborgne, and Jan Reineke. 2015. CacheAudit: A Tool for the Static Analysis of Cache Side Channels. ACM Trans. Inf. Syst. Secur., 18, 1 (2015), Article 4, June, 32 pages. issn:1094-9224 https://doi.org/10.1145/2756550 Google Scholar
Digital Library
- Jan Edler and Mark D. Hill. 1999. Dinero IV Trace-Driven Uniprocessor Cache Simulator. http://pages.cs.wisc.edu/~markhill/DineroIVGoogle Scholar
- Paul Feautrier. 1991. Dataflow analysis of array and scalar references. Int. J. Parallel Program., 20, 1 (1991), 23–53. https://doi.org/10.1007/BF01407931 Google Scholar
Cross Ref
- Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Program., 21, 5 (1992), 313–347. https://doi.org/10.1007/BF01407835 Google Scholar
Digital Library
- Dennis Gannon, William Jalby, and Kyle A. Gallivan. 1988. Strategies for Cache and Local Memory Management by Global Program Transformation. J. Parallel Distributed Comput., 5, 5 (1988), 587–616. https://doi.org/10.1016/0743-7315(88)90014-7 Google Scholar
Digital Library
- Thomas Gawlitza, Jérôme Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, and Reinhard Wilhelm. 2009. Polynomial Precise Interval Analysis Revisited. In Efficient Algorithms, Essays Dedicated to Kurt Mehlhorn on the Occasion of His 60th Birthday, Susanne Albers, Helmut Alt, and Stefan Näher (Eds.). 5760, Springer, 422–437. https://doi.org/10.1007/978-3-642-03456-5_28 Google Scholar
Digital Library
- Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1997. Cache Miss Equations: An Analytical Representation of Cache Misses. In Proceedings of the 11th International Conference on Supercomputing, ICS 1997, Vienna, Austria, July 7-11, 1997, Steven J. Wallach and Hans P. Zima (Eds.). ACM, 317–324. https://doi.org/10.1145/263580.263657 Google Scholar
Digital Library
- Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior. ACM Trans. Program. Lang. Syst., 21, 4 (1999), July, 703–746. issn:0164-0925 https://doi.org/10.1145/325478.325479 Google Scholar
Digital Library
- David Griffin, Benjamin Lesage, Alan Burns, and Robert I. Davis. 2014. Lossy Compression for Worst-Case Execution Time Analysis of PLRU Caches. In 22nd International Conference on Real-Time Networks and Systems, RTNS ’14, Versailles, France, October 8-10, 2014, Mathieu Jan, Belgacem Ben Hedia, Joël Goossens, and Claire Maiza (Eds.). ACM, 203. https://doi.org/10.1145/2659787.2659807 Google Scholar
Digital Library
- Daniel Grund and Jan Reineke. 2009. Abstract Interpretation of FIFO Replacement. In Static Analysis, 16th International Symposium, SAS 2009, Los Angeles, CA, USA, August 9-11, 2009. Proceedings, Jens Palsberg and Zhendong Su (Eds.). 5673, Springer, 120–136. https://doi.org/10.1007/978-3-642-03237-0_10 Google Scholar
Digital Library
- Daniel Grund and Jan Reineke. 2010. Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection. In 22nd Euromicro Conference on Real-Time Systems, ECRTS 2010, Brussels, Belgium, July 6-9, 2010. IEEE Computer Society, 155–164. https://doi.org/10.1109/ECRTS.2010.8 Google Scholar
Digital Library
- Daniel Grund and Jan Reineke. 2010. Toward Precise PLRU Cache Analysis. In 10th International Workshop on Worst-Case Execution Time Analysis, WCET 2010, July 6, 2010, Brussels, Belgium, Björn Lisper (Ed.) (OASICS, Vol. 15). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 23–35. https://doi.org/10.4230/OASIcs.WCET.2010.23 Google Scholar
Cross Ref
- Nan Guan, Mingsong Lv, Wang Yi, and Ge Yu. 2014. WCET Analysis with MRU Cache: Challenging LRU for Predictability. ACM Trans. Embed. Comput. Syst., 13, 4s (2014), Article 123, April, 26 pages. issn:1539-9087 https://doi.org/10.1145/2584655 Google Scholar
Digital Library
- Nan Guan, Xinping Yang, Mingsong Lv, and Wang Yi. 2013. FIFO cache analysis for WCET estimation: a quantitative approach. In Design, Automation and Test in Europe, DATE 13, Grenoble, France, March 18-22, 2013, Enrico Macii (Ed.). EDA Consortium San Jose, CA, USA / ACM DL, 296–301. https://doi.org/10.7873/DATE.2013.073 Google Scholar
Cross Ref
- Tobias Gysi, Tobias Grosser, Laurin Brandner, and Torsten Hoefler. [n. d.]. Replication Package for Article: A Fast Analytical Model of Fully Associative Caches. https://doi.org/10.1145/3325990 Google Scholar
Digital Library
- Tobias Gysi, Tobias Grosser, Laurin Brandner, and Torsten Hoefler. 2019. A fast analytical model of fully associative caches. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 816–829. https://doi.org/10.1145/3314221.3314606 Google Scholar
Digital Library
- Christoph Haase. 2018. A Survival Guide to Presburger Arithmetic. ACM SIGLOG News, 5, 3 (2018), July, 67–82. https://doi.org/10.1145/3242953.3242964 Google Scholar
Digital Library
- Mark D. Hill and Alan Jay Smith. 1989. Evaluating Associativity in CPU Caches. IEEE Trans. Computers, 38, 12 (1989), 1612–1630. https://doi.org/10.1109/12.40842 Google Scholar
Digital Library
- Ralf Hund, Carsten Willems, and Thorsten Holz. 2013. Practical timing side channel attacks against kernel space ASLR. In 2013 IEEE Symposium on Security and Privacy. 191–205. http://dx.doi.org/10.1109/SP.2013.23 Google Scholar
Digital Library
- Ravi Iyer. 2003. On modeling and analyzing cache hierarchies using CASPER. In 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003.. 182–187. https://doi.org/10.1109/MASCOT.2003.1240655 Google Scholar
Cross Ref
- Sanjeev Jahagirdar, Varghese George, Inder Sodhi, and Ryan Wells. 2012. Power management of the third generation Intel Core Micro Architecture formerly Codenamed Ivy Bridge. 2012 IEEE Hot Chips 24 Symposium (HCS), Aug, isbn:9781467388795 https://doi.org/10.1109/hotchips.2012.7476478 Google Scholar
Cross Ref
- Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High Performance Cache Replacement Using Re-reference Interval Prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). ACM, New York, NY, USA. 60–71. isbn:978-1-4503-0053-7 https://doi.org/10.1145/1815961.1815971 Google Scholar
Digital Library
- Richard M. Karp, Raymond E. Miller, and Shmuel Winograd. 1967. The Organization of Computations for Uniform Recurrence Equations. J. ACM, 14, 3 (1967), July, 563–590. issn:0004-5411 https://doi.org/10.1145/321406.321418 Google Scholar
Digital Library
- Leslie Lamport. 1974. The Parallel Execution of DO Loops. Commun. ACM, 17, 2 (1974), 83–93. https://doi.org/10.1145/360827.360844 Google Scholar
Digital Library
- Jérôme Leroux and Grégoire Sutre. 2007. Accelerated Data-Flow Analysis. In Static Analysis, 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24, 2007, Proceedings, Hanne Riis Nielson and Gilberto Filé (Eds.). 4634, Springer, 184–199. https://doi.org/10.1007/978-3-540-74061-2_12 Google Scholar
Cross Ref
- Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Syst. J., 9, 2 (1970), 78–117. https://doi.org/10.1147/sj.92.0078 Google Scholar
Digital Library
- Clémentine Maurice, Nicolas Scouarnec, Christoph Neumann, Olivier Heen, and Aurélien Francillon. 2015. Reverse Engineering Intel Last-Level Cache Complex Addressing Using Performance Counters. In Proceedings of the 18th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 9404 (RAID). New York, NY, USA. 48–65. isbn:978-3-319-26361-8 https://doi.org/10.1007/978-3-319-26362-5_3 Google Scholar
Digital Library
- David Monniaux and Valentin Touzeau. 2019. On the Complexity of Cache Analysis for Different Replacement Policies. J. ACM, 66, 6 (2019), 41:1–41:22. https://doi.org/10.1145/3366018 Google Scholar
Digital Library
- Canberk Morelli and Jan Reineke. 2022. Replication Package for Warping Cache Simulation of Polyhedral Programs. https://doi.org/10.5281/zenodo.6330004 Google Scholar
Cross Ref
- Canberk Morelli and Jan Reineke. 2022. Warping Cache Simulation of Polyhedral Programs. CoRR, abs/2203.14845 (2022), https://doi.org/10.48550/ARXIV.2203.14845 arxiv:2203.14845.Google Scholar
- Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. https://sourceforge.net/projects/polybench/Google Scholar
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive Insertion Policies for High Performance Caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA ’07). ACM, New York, NY, USA. 381–391. isbn:9781595937063 https://doi.org/10.1145/1250662.1250709 Google Scholar
Digital Library
- Dyer Rolán, Basilio B. Fraguela, and Ramón Doallo. 2009. Adaptive Line Placement with the Set Balancing Cache. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA. 529–540. isbn:9781605587981 https://doi.org/10.1145/1669112.1669178 Google Scholar
Digital Library
- Daniel Sánchez and Christos Kozyrakis. 2010. The ZCache: Decoupling Ways and Associativity. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, 4-8 December 2010, Atlanta, Georgia, USA. IEEE Computer Society, 187–198. https://doi.org/10.1109/MICRO.2010.20 Google Scholar
Digital Library
- André Seznec. 1993. A Case for Two-Way Skewed-Associative Caches. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA ’93). ACM, New York, NY, USA. 169–178. isbn:0818638109 https://doi.org/10.1145/165123.165152 Google Scholar
Digital Library
- Alan Jay Smith. 1978. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. IEEE Trans. Software Eng., 4, 2 (1978), 121–130. https://doi.org/10.1109/TSE.1978.231482 Google Scholar
Digital Library
- Yan Solihin. 2015. Fundamentals of Parallel Multicore Architecture (1st ed.). Chapman & Hall/CRC. isbn:1482211181Google Scholar
- Zhendong Su and David A. Wagner. 2004. A Class of Polynomially Solvable Range Constraints for Interval Analysis without Widenings and Narrowings. In Tools and Algorithms for the Construction and Analysis of Systems, 10th International Conference, TACAS 2004, Barcelona, Spain, March 29 - April 2, 2004, Proceedings, Kurt Jensen and Andreas Podelski (Eds.). 2988, Springer, 280–295. https://doi.org/10.1007/978-3-540-24730-2_23 Google Scholar
Cross Ref
- Daniel Terpstra, Heike Jagode, Haihang You, and Jack J. Dongarra. 2009. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009 - Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing, September 2009, ZIH, Dresden, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer, 157–173. https://doi.org/10.1007/978-3-642-11261-4_11 Google Scholar
Cross Ref
- Valentin Touzeau, Claire Maïza, David Monniaux, and Jan Reineke. 2017. Ascertaining Uncertainty for Efficient Exact Cache Analysis. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II, Rupak Majumdar and Viktor Kuncak (Eds.). 10427, Springer, 22–40. https://doi.org/10.1007/978-3-319-63390-9_2 Google Scholar
Cross Ref
- Valentin Touzeau, Claire Maïza, David Monniaux, and Jan Reineke. 2019. Fast and exact analysis for LRU caches. Proc. ACM Program. Lang., 3, POPL (2019), 54:1–54:29. https://doi.org/10.1145/3290367 Google Scholar
Digital Library
- Xavier Vera, Nerina Bermudo, Josep Llosa, and Antonio González. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Program. Lang. Syst., 26, 2 (2004), 263–300. https://doi.org/10.1145/973097.973099 Google Scholar
Digital Library
- Xavier Vera and Jingling Xue. 2002. Let’s Study Whole-Program Cache Behaviour Analytically. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA’02), Boston, Massachusettes, USA, February 2-6, 2002. IEEE Computer Society, 175–186. https://doi.org/10.1109/HPCA.2002.995708 Google Scholar
Cross Ref
- Sven Verdoolaege. 2010. isl: An Integer Set Library for the Polyhedral Model. In Mathematical Software – ICMS 2010, Komei Fukuda, Joris van der Hoeven, Michael Joswig, and Nobuki Takayama (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 299–302. isbn:978-3-642-15582-6 https://doi.org/10.1007/978-3-642-15582-6_49 Google Scholar
Cross Ref
- Sven Verdoolaege. 2016. Presburger formulas and polyhedral compilation. https://lirias.kuleuven.be/retrieve/361209Google Scholar
- Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral Extraction Tool. In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12). Paris, France.Google Scholar
- Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim., 9, 4 (2013), 54:1–54:23. https://doi.org/10.1145/2400682.2400713 Google Scholar
Digital Library
- Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Köpf. 2020. CacheQuery: learning replacement policies from hardware caches. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 519–532. https://doi.org/10.1145/3385412.3386008 Google Scholar
Digital Library
- Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Toronto, Ontario, Canada, June 26-28, 1991, David S. Wise (Ed.). ACM, 30–44. https://doi.org/10.1145/113445.113449 Google Scholar
Digital Library
- Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In ISPASS. 235–246. https://doi.org/10.1109/ISPASS.2010.5452013 Google Scholar
Cross Ref
- Yuval Yarom, Qian Ge, Fangfei Liu, Ruby B Lee, and Gernot Heiser. 2015. Mapping the Intel Last-Level Cache.. Cryptology ePrint Archive, Report 2015/905, https://eprint.iacr.org/2015/905Google Scholar
Index Terms
- Warping cache simulation of polyhedral programs
Recommendations
T-SPaCS—A Two-Level Single-Pass Cache Simulation Methodology
The cache hierarchy's large contribution to total microprocessor system power makes caches a good optimization candidate. To facilitate a fast design-time cache optimization process, we propose a single-pass trace-driven cache simulation methodology—T-...
Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks
In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private ...
Analytical modeling of cache behavior for affine programs
Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost ...





Comments