skip to main content
10.1145/3519939.3523714acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections

Warping cache simulation of polyhedral programs

Published:09 June 2022Publication History

ABSTRACT

Techniques to evaluate a program's cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is proportional to the number of memory accesses performed by the program under analysis. 2. Relying on implicit workload characterizations such as the polyhedral model, analytical approaches often achieve problem-size-independent runtimes, but so far have been limited to idealized cache models.

We introduce a hybrid approach, warping cache simulation, that aims to achieve applicability to real-world cache models and problem-size-independent runtimes. As prior analytical approaches, we focus on programs in the polyhedral model, which allows to reason about the sequence of memory accesses analytically. Combining this analytical reasoning with information about the cache behavior obtained from explicit cache simulation allows us to soundly fast-forward the simulation. By this process of warping, we accelerate the simulation so that its cost is often independent of the number of memory accesses.

References

  1. Andreas Abel and Jan Reineke. 2013. Measurement-based modeling of the cache replacement policy. In 19th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2013, Philadelphia, PA, USA, April 9-11, 2013. IEEE Computer Society, 65–74. https://doi.org/10.1109/RTAS.2013.6531080 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andreas Abel and Jan Reineke. 2020. nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems. In IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2020, Boston, MA, USA, August 23-25, 2020. IEEE, 34–46. https://doi.org/10.1109/ISPASS48437.2020.00014 Google ScholarGoogle ScholarCross RefCross Ref
  3. Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic. 2004. Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite. In Proceedings of the 42nd Annual Southeast Regional Conference (ACM-SE 42). ACM, New York, NY, USA. 267–272. isbn:1581138709 https://doi.org/10.1145/986537.986601 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Martin Alt, Christian Ferdinand, Florian Martin, and Reinhard Wilhelm. 1996. Cache Behavior Prediction by Abstract Interpretation. In Static Analysis, Third International Symposium, SAS’96, Aachen, Germany, September 24-26, 1996, Proceedings. 52–66. https://doi.org/10.1007/3-540-61739-6_33 Google ScholarGoogle ScholarCross RefCross Ref
  5. Aurore Annichini, Ahmed Bouajjani, and Mihaela Sighireanu. 2001. TReX: A Tool for Reachability Analysis of Complex Systems. In Computer Aided Verification, 13th International Conference, CAV 2001, Paris, France, Gérard Berry, Hubert Comon, and Alain Finkel (Eds.). 2102, Springer, 368–372. https://doi.org/10.1007/3-540-44585-4_34 Google ScholarGoogle ScholarCross RefCross Ref
  6. David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-Performance Computing. ACM Comput. Surv., 26, 4 (1994), 345–420. https://doi.org/10.1145/197405.197406 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, and P. Sadayappan. 2018. Analytical modeling of cache behavior for affine programs. Proc. ACM Program. Lang., 2, POPL (2018), 32:1–32:26. https://doi.org/10.1145/3158120 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sébastien Bardin, Alain Finkel, Jérôme Leroux, and Laure Petrucci. 2003. FAST: Fast Acceleration of Symbolic Transition Systems. In Computer Aided Verification, 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003, Warren A. Hunt Jr. and Fabio Somenzi (Eds.). 2725, Springer, 118–121. https://doi.org/10.1007/978-3-540-45069-6_12 Google ScholarGoogle ScholarCross RefCross Ref
  9. Mohamed-Walid Benabderrahmane, Louis-Noël Pouchet, Albert Cohen, and Cédric Bastoul. 2010. The Polyhedral Model Is More Widely Applicable Than You Think. In Compiler Construction, 19th International Conference, CC 2010, Rajiv Gupta (Ed.). 6011, Springer, 283–303. https://doi.org/10.1007/978-3-642-11970-5_16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kristof Beyls and Erik H. D’Hollander. 2005. Generating cache hints for improved program efficiency. J. Syst. Archit., 51, 4 (2005), 223–250. https://doi.org/10.1016/j.sysarc.2004.09.004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bernard Boigelot and Pierre Wolper. 1994. Symbolic Verification with Periodic Sets. In Computer Aided Verification, 6th International Conference, CAV ’94, Stanford, California, USA, June 21-23, 1994, Proceedings, David L. Dill (Ed.). 818, Springer, 55–67. https://doi.org/10.1007/3-540-58179-0_43 Google ScholarGoogle ScholarCross RefCross Ref
  12. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 101–113. https://doi.org/10.1145/1375581.1375595 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Calin Cascaval and David A. Padua. 2003. Estimating Cache Misses and Locality Using Stack Distances. In Proceedings of the 17th International Conference on Supercomputing (ICS ’03). ACM, New York, NY, USA. 150–159. isbn:1581137338 https://doi.org/10.1145/782814.782836 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact Analysis of the Cache Behavior of Nested Loops. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Snowbird, Utah, USA, June 20-22, 2001, Michael Burke and Mary Lou Soffa (Eds.). ACM, 286–297. https://doi.org/10.1145/378795.378859 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sudipta Chattopadhyay and Abhik Roychoudhury. 2013. Scalable and precise refinement of cache timing analysis via path-sensitive verification. Real Time Syst., 49, 4 (2013), 517–562. https://doi.org/10.1007/s11241-013-9178-0 Google ScholarGoogle ScholarCross RefCross Ref
  16. Dong Chen, Fangzhou Liu, Chen Ding, and Sreepathi Pai. 2018. Locality Analysis through Static Parallel Sampling. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York, NY, USA. 557–570. isbn:9781450356985 https://doi.org/10.1145/3192366.3192402 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hubert Comon and Yan Jurski. 1998. Multiple Counters Automata, Safety Analysis and Presburger Arithmetic. In Computer Aided Verification, 10th International Conference, CAV ’98, Vancouver, BC, Canada, June 28 - July 2, 1998, Proceedings, Alan J. Hu and Moshe Y. Vardi (Eds.). 1427, Springer, 268–279. https://doi.org/10.1007/BFb0028751 Google ScholarGoogle ScholarCross RefCross Ref
  18. Goran Doychev, Boris Köpf, Laurent Mauborgne, and Jan Reineke. 2015. CacheAudit: A Tool for the Static Analysis of Cache Side Channels. ACM Trans. Inf. Syst. Secur., 18, 1 (2015), Article 4, June, 32 pages. issn:1094-9224 https://doi.org/10.1145/2756550 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jan Edler and Mark D. Hill. 1999. Dinero IV Trace-Driven Uniprocessor Cache Simulator. http://pages.cs.wisc.edu/~markhill/DineroIVGoogle ScholarGoogle Scholar
  20. Paul Feautrier. 1991. Dataflow analysis of array and scalar references. Int. J. Parallel Program., 20, 1 (1991), 23–53. https://doi.org/10.1007/BF01407931 Google ScholarGoogle ScholarCross RefCross Ref
  21. Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Program., 21, 5 (1992), 313–347. https://doi.org/10.1007/BF01407835 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dennis Gannon, William Jalby, and Kyle A. Gallivan. 1988. Strategies for Cache and Local Memory Management by Global Program Transformation. J. Parallel Distributed Comput., 5, 5 (1988), 587–616. https://doi.org/10.1016/0743-7315(88)90014-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thomas Gawlitza, Jérôme Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, and Reinhard Wilhelm. 2009. Polynomial Precise Interval Analysis Revisited. In Efficient Algorithms, Essays Dedicated to Kurt Mehlhorn on the Occasion of His 60th Birthday, Susanne Albers, Helmut Alt, and Stefan Näher (Eds.). 5760, Springer, 422–437. https://doi.org/10.1007/978-3-642-03456-5_28 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1997. Cache Miss Equations: An Analytical Representation of Cache Misses. In Proceedings of the 11th International Conference on Supercomputing, ICS 1997, Vienna, Austria, July 7-11, 1997, Steven J. Wallach and Hans P. Zima (Eds.). ACM, 317–324. https://doi.org/10.1145/263580.263657 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior. ACM Trans. Program. Lang. Syst., 21, 4 (1999), July, 703–746. issn:0164-0925 https://doi.org/10.1145/325478.325479 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David Griffin, Benjamin Lesage, Alan Burns, and Robert I. Davis. 2014. Lossy Compression for Worst-Case Execution Time Analysis of PLRU Caches. In 22nd International Conference on Real-Time Networks and Systems, RTNS ’14, Versailles, France, October 8-10, 2014, Mathieu Jan, Belgacem Ben Hedia, Joël Goossens, and Claire Maiza (Eds.). ACM, 203. https://doi.org/10.1145/2659787.2659807 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Daniel Grund and Jan Reineke. 2009. Abstract Interpretation of FIFO Replacement. In Static Analysis, 16th International Symposium, SAS 2009, Los Angeles, CA, USA, August 9-11, 2009. Proceedings, Jens Palsberg and Zhendong Su (Eds.). 5673, Springer, 120–136. https://doi.org/10.1007/978-3-642-03237-0_10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Daniel Grund and Jan Reineke. 2010. Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection. In 22nd Euromicro Conference on Real-Time Systems, ECRTS 2010, Brussels, Belgium, July 6-9, 2010. IEEE Computer Society, 155–164. https://doi.org/10.1109/ECRTS.2010.8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daniel Grund and Jan Reineke. 2010. Toward Precise PLRU Cache Analysis. In 10th International Workshop on Worst-Case Execution Time Analysis, WCET 2010, July 6, 2010, Brussels, Belgium, Björn Lisper (Ed.) (OASICS, Vol. 15). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 23–35. https://doi.org/10.4230/OASIcs.WCET.2010.23 Google ScholarGoogle ScholarCross RefCross Ref
  30. Nan Guan, Mingsong Lv, Wang Yi, and Ge Yu. 2014. WCET Analysis with MRU Cache: Challenging LRU for Predictability. ACM Trans. Embed. Comput. Syst., 13, 4s (2014), Article 123, April, 26 pages. issn:1539-9087 https://doi.org/10.1145/2584655 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nan Guan, Xinping Yang, Mingsong Lv, and Wang Yi. 2013. FIFO cache analysis for WCET estimation: a quantitative approach. In Design, Automation and Test in Europe, DATE 13, Grenoble, France, March 18-22, 2013, Enrico Macii (Ed.). EDA Consortium San Jose, CA, USA / ACM DL, 296–301. https://doi.org/10.7873/DATE.2013.073 Google ScholarGoogle ScholarCross RefCross Ref
  32. Tobias Gysi, Tobias Grosser, Laurin Brandner, and Torsten Hoefler. [n. d.]. Replication Package for Article: A Fast Analytical Model of Fully Associative Caches. https://doi.org/10.1145/3325990 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tobias Gysi, Tobias Grosser, Laurin Brandner, and Torsten Hoefler. 2019. A fast analytical model of fully associative caches. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 816–829. https://doi.org/10.1145/3314221.3314606 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christoph Haase. 2018. A Survival Guide to Presburger Arithmetic. ACM SIGLOG News, 5, 3 (2018), July, 67–82. https://doi.org/10.1145/3242953.3242964 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mark D. Hill and Alan Jay Smith. 1989. Evaluating Associativity in CPU Caches. IEEE Trans. Computers, 38, 12 (1989), 1612–1630. https://doi.org/10.1109/12.40842 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ralf Hund, Carsten Willems, and Thorsten Holz. 2013. Practical timing side channel attacks against kernel space ASLR. In 2013 IEEE Symposium on Security and Privacy. 191–205. http://dx.doi.org/10.1109/SP.2013.23 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ravi Iyer. 2003. On modeling and analyzing cache hierarchies using CASPER. In 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003.. 182–187. https://doi.org/10.1109/MASCOT.2003.1240655 Google ScholarGoogle ScholarCross RefCross Ref
  38. Sanjeev Jahagirdar, Varghese George, Inder Sodhi, and Ryan Wells. 2012. Power management of the third generation Intel Core Micro Architecture formerly Codenamed Ivy Bridge. 2012 IEEE Hot Chips 24 Symposium (HCS), Aug, isbn:9781467388795 https://doi.org/10.1109/hotchips.2012.7476478 Google ScholarGoogle ScholarCross RefCross Ref
  39. Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High Performance Cache Replacement Using Re-reference Interval Prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). ACM, New York, NY, USA. 60–71. isbn:978-1-4503-0053-7 https://doi.org/10.1145/1815961.1815971 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Richard M. Karp, Raymond E. Miller, and Shmuel Winograd. 1967. The Organization of Computations for Uniform Recurrence Equations. J. ACM, 14, 3 (1967), July, 563–590. issn:0004-5411 https://doi.org/10.1145/321406.321418 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Leslie Lamport. 1974. The Parallel Execution of DO Loops. Commun. ACM, 17, 2 (1974), 83–93. https://doi.org/10.1145/360827.360844 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jérôme Leroux and Grégoire Sutre. 2007. Accelerated Data-Flow Analysis. In Static Analysis, 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24, 2007, Proceedings, Hanne Riis Nielson and Gilberto Filé (Eds.). 4634, Springer, 184–199. https://doi.org/10.1007/978-3-540-74061-2_12 Google ScholarGoogle ScholarCross RefCross Ref
  43. Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Syst. J., 9, 2 (1970), 78–117. https://doi.org/10.1147/sj.92.0078 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Clémentine Maurice, Nicolas Scouarnec, Christoph Neumann, Olivier Heen, and Aurélien Francillon. 2015. Reverse Engineering Intel Last-Level Cache Complex Addressing Using Performance Counters. In Proceedings of the 18th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 9404 (RAID). New York, NY, USA. 48–65. isbn:978-3-319-26361-8 https://doi.org/10.1007/978-3-319-26362-5_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. David Monniaux and Valentin Touzeau. 2019. On the Complexity of Cache Analysis for Different Replacement Policies. J. ACM, 66, 6 (2019), 41:1–41:22. https://doi.org/10.1145/3366018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Canberk Morelli and Jan Reineke. 2022. Replication Package for Warping Cache Simulation of Polyhedral Programs. https://doi.org/10.5281/zenodo.6330004 Google ScholarGoogle ScholarCross RefCross Ref
  47. Canberk Morelli and Jan Reineke. 2022. Warping Cache Simulation of Polyhedral Programs. CoRR, abs/2203.14845 (2022), https://doi.org/10.48550/ARXIV.2203.14845 arxiv:2203.14845.Google ScholarGoogle Scholar
  48. Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. https://sourceforge.net/projects/polybench/Google ScholarGoogle Scholar
  49. Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive Insertion Policies for High Performance Caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA ’07). ACM, New York, NY, USA. 381–391. isbn:9781595937063 https://doi.org/10.1145/1250662.1250709 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Dyer Rolán, Basilio B. Fraguela, and Ramón Doallo. 2009. Adaptive Line Placement with the Set Balancing Cache. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA. 529–540. isbn:9781605587981 https://doi.org/10.1145/1669112.1669178 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Daniel Sánchez and Christos Kozyrakis. 2010. The ZCache: Decoupling Ways and Associativity. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, 4-8 December 2010, Atlanta, Georgia, USA. IEEE Computer Society, 187–198. https://doi.org/10.1109/MICRO.2010.20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. André Seznec. 1993. A Case for Two-Way Skewed-Associative Caches. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA ’93). ACM, New York, NY, USA. 169–178. isbn:0818638109 https://doi.org/10.1145/165123.165152 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Alan Jay Smith. 1978. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. IEEE Trans. Software Eng., 4, 2 (1978), 121–130. https://doi.org/10.1109/TSE.1978.231482 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yan Solihin. 2015. Fundamentals of Parallel Multicore Architecture (1st ed.). Chapman & Hall/CRC. isbn:1482211181Google ScholarGoogle Scholar
  55. Zhendong Su and David A. Wagner. 2004. A Class of Polynomially Solvable Range Constraints for Interval Analysis without Widenings and Narrowings. In Tools and Algorithms for the Construction and Analysis of Systems, 10th International Conference, TACAS 2004, Barcelona, Spain, March 29 - April 2, 2004, Proceedings, Kurt Jensen and Andreas Podelski (Eds.). 2988, Springer, 280–295. https://doi.org/10.1007/978-3-540-24730-2_23 Google ScholarGoogle ScholarCross RefCross Ref
  56. Daniel Terpstra, Heike Jagode, Haihang You, and Jack J. Dongarra. 2009. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009 - Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing, September 2009, ZIH, Dresden, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer, 157–173. https://doi.org/10.1007/978-3-642-11261-4_11 Google ScholarGoogle ScholarCross RefCross Ref
  57. Valentin Touzeau, Claire Maïza, David Monniaux, and Jan Reineke. 2017. Ascertaining Uncertainty for Efficient Exact Cache Analysis. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II, Rupak Majumdar and Viktor Kuncak (Eds.). 10427, Springer, 22–40. https://doi.org/10.1007/978-3-319-63390-9_2 Google ScholarGoogle ScholarCross RefCross Ref
  58. Valentin Touzeau, Claire Maïza, David Monniaux, and Jan Reineke. 2019. Fast and exact analysis for LRU caches. Proc. ACM Program. Lang., 3, POPL (2019), 54:1–54:29. https://doi.org/10.1145/3290367 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xavier Vera, Nerina Bermudo, Josep Llosa, and Antonio González. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Program. Lang. Syst., 26, 2 (2004), 263–300. https://doi.org/10.1145/973097.973099 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Xavier Vera and Jingling Xue. 2002. Let’s Study Whole-Program Cache Behaviour Analytically. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA’02), Boston, Massachusettes, USA, February 2-6, 2002. IEEE Computer Society, 175–186. https://doi.org/10.1109/HPCA.2002.995708 Google ScholarGoogle ScholarCross RefCross Ref
  61. Sven Verdoolaege. 2010. isl: An Integer Set Library for the Polyhedral Model. In Mathematical Software – ICMS 2010, Komei Fukuda, Joris van der Hoeven, Michael Joswig, and Nobuki Takayama (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 299–302. isbn:978-3-642-15582-6 https://doi.org/10.1007/978-3-642-15582-6_49 Google ScholarGoogle ScholarCross RefCross Ref
  62. Sven Verdoolaege. 2016. Presburger formulas and polyhedral compilation. https://lirias.kuleuven.be/retrieve/361209Google ScholarGoogle Scholar
  63. Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral Extraction Tool. In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12). Paris, France.Google ScholarGoogle Scholar
  64. Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim., 9, 4 (2013), 54:1–54:23. https://doi.org/10.1145/2400682.2400713 Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Köpf. 2020. CacheQuery: learning replacement policies from hardware caches. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 519–532. https://doi.org/10.1145/3385412.3386008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Toronto, Ontario, Canada, June 26-28, 1991, David S. Wise (Ed.). ACM, 30–44. https://doi.org/10.1145/113445.113449 Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In ISPASS. 235–246. https://doi.org/10.1109/ISPASS.2010.5452013 Google ScholarGoogle ScholarCross RefCross Ref
  68. Yuval Yarom, Qian Ge, Fangfei Liu, Ruby B Lee, and Gernot Heiser. 2015. Mapping the Intel Last-Level Cache.. Cryptology ePrint Archive, Report 2015/905, https://eprint.iacr.org/2015/905Google ScholarGoogle Scholar

Index Terms

  1. Warping cache simulation of polyhedral programs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Article Metrics

        • Downloads (Last 12 months)208
        • Downloads (Last 6 weeks)13

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader