skip to main content
research-article

Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective

Published:17 June 2019Publication History
Skip Abstract Section

Abstract

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited.

This article develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special class of irregular memory accesses often seen in high-performance workloads. We evaluate this across a wide set of systems, all of which gain benefit from the technique. We then evaluate the extent to which good prefetch instructions are architecture dependent and the class of programs that are particularly amenable. Across a set of memory-bound benchmarks, our automated pass achieves average speedups of 1.3× for an Intel Haswell processor, 1.1× for both an ARM Cortex-A57 and Qualcomm Kryo, 1.2× for a Cortex-72 and an Intel Kaby Lake, and 1.35× for an Intel Xeon Phi Knight’s Landing, each of which is an out-of-order core, and performance improvements of 2.1× and 2.7× for the in-order ARM Cortex-A53 and first generation Intel Xeon Phi.

References

  1. Thomas Mueller. 2012. What integer hash function are good that accepts an integer hash key? Stack Overflow. Retrieved from http://stackoverflow.com/questions/664014/what-integer-hash-function-are-good-that-accepts-an-integer-hash-key#12996028.Google ScholarGoogle Scholar
  2. S. Ainsworth and Timothy M. Jones. 2017. Software prefetching for indirect memory accesses. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’17).Google ScholarGoogle Scholar
  3. Sam Ainsworth and Timothy M. Jones. 2018. An event-triggered programmable prefetcher for irregular workloads. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Murali Annavaram, Jignesh M. Patel, and Edward S. Davidson. 2001. Data prefetching by dependence graph precomputation. In Proceedings of the International Symposium on Computer Architecture (ISCA’01). 10.Google ScholarGoogle Scholar
  5. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS parallel benchmarks—Summary and preliminary results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’91). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Cahoon and K. S. McKinley. 2001. Data flow analysis for software prefetching linked data structures in Java. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’01).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brendon Cahoon and Kathryn S. McKinley. 2002. Simple and effective array prefetching in Java. In Proceedings of the Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande (JGI’02).Google ScholarGoogle Scholar
  8. David Callahan, Ken Kennedy, and Allan Porterfield. 1991. Software prefetching. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’91).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, and Todd C. Mowry. 2007. Improving hash join performance through prefetching. ACM Trans. Database Syst. 32, 3, Article 17 (Aug. 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tien-Fu Chen and Jean-Loup Baer. 1992. Reducing memory latency via non-blocking and prefetching caches. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’92).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Robert Cooksey, Stephan Jourdan, and Dirk Grunwald. 2002. A stateless, content-directed data prefetching mechanism. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’02).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Babak Falsafi and Thomas F. Wenisch. 2014. A primer on hardware prefetching. Synth. Lect. Comput. Arch. 9, 1 (2014).Google ScholarGoogle Scholar
  13. Andrei Frumusanu. 2016. The ARM Cortex A73—Artemis Unveiled. Retrieved from http://www.anandtech.com/show/10347/arm-cortex-a73-artemis-unveiled/2.Google ScholarGoogle Scholar
  14. Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, and Stefanos Kaxiras. 2014. Fix the code. Don’t tweak the hardware: A new compiler approach to voltage-frequency scaling. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Khan and E. Hagersten. 2014. Resource conscious prefetching for irregular applications in multicores. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’14).Google ScholarGoogle Scholar
  16. Muneeb Khan, Michael A. Laurenzano, Jason Mars, Erik Hagersten, and David Black-Schaffer. 2015. AREP: Adaptive resource efficient prefetching for maximizing multicore performance. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dongkeun Kim and Donald Yeung. 2002. Design and evaluation of compiler algorithms for pre-execution. SIGPLAN Not. 37, 10 (Oct. 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kim, S. H. Pugsley, P. V. Gratz, A. L. N. Reddy, C. Wilkerson, and Z. Chishti. 2016. Path confidence based lookahead prefetching. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’16).Google ScholarGoogle Scholar
  19. Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the Walkers: Accelerating index traversals for in-memory databases. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rakesh Krishnaiyer. 2012. Compiler Prefetching for the Intel Xeon Phi coprocessor. Retrieved from https://software.intel.com/sites/default/files/managed/54/77/5.3-prefetching-on-mic-update.pdf.Google ScholarGoogle Scholar
  21. R. Krishnaiyer, E. Kultursay, P. Chawla, S. Preis, A. Zvezdin, and H. Saito. 2013. Compiler-based data prefetching and streaming non-temporal store generation for the Intel(R) Xeon Phi(TM) coprocessor. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPSW’13).Google ScholarGoogle Scholar
  22. Snehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, and Jordon Phillips. 2014. SQRL: Hardware accelerator for collecting software data structures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. 2012. When prefetching works, when it doesn’t, and why. ACM Trans. Archit. Code Optim. 9, 1, Article 2 (March 2012), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mikko H. Lipasti, William J. Schmidt, Steven R. Kunkel, and Robert R. Roediger. 1995. SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’95).Google ScholarGoogle Scholar
  26. Chi-Keung Luk and Todd C. Mowry. 1996. Compiler-based prefetching for recursive data structures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’96). 12.Google ScholarGoogle Scholar
  27. Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. 2007. Challenges in parallel graph processing. Parallel Process. Lett. 17, 1 (2007).Google ScholarGoogle ScholarCross RefCross Ref
  28. Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC challenge (HPCC) benchmark suite. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’06). Article 213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Malhotra and C. Kozyrakis. 2006. Library-Based Prefetching for Pointer-Intensive Applications. Technical Report. Computer Systems Laboratory, Stanford University.Google ScholarGoogle Scholar
  30. John D. McCalpin. 2013. Native Computing and Optimization on the Intel Xeon Phi Coprocessor. Retrieved from https://portal.tacc.utexas.edu/documents/13601/933270/MIC_Native_2013-11-16.pdf.Google ScholarGoogle Scholar
  31. Andreas Moshovos, Dionisios N. Pnevmatikatos, and Amirali Baniasadi. 2001. Slice-processors: An implementation of operation-based prediction. In Proceedings of the International Conference on Supercomputing (ICS’01). 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Todd C. Mowry. 1994. Tolerating Latency Through Software-Controlled Data Prefetching. Ph.D. Dissertation. Stanford University, Computer Systems Laboratory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Todd C. Mowry, Monica S. Lam, and Anoop Gupta. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’92). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. May 5, 2010. Introducing the Graph 500. Cray User’s Group (CUG) (May 5, 2010).Google ScholarGoogle Scholar
  35. Karthik Nilakant, Valentin Dalibard, Amitabha Roy, and Eiko Yoneki. 2014. PrefEdge: SSD prefetcher for large-scale graph traversal. In Proceedings of the ACM International Systems and Storage Conference (SYSTOR’14). Article 4, 12 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Amir Roth, Andreas Moshovos, and Gurindar S. Sohi. 1998. Dependence based prefetching for linked data structures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’98).Google ScholarGoogle Scholar
  37. M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti. 2015. Efficiently prefetching complex address patterns. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’15).Google ScholarGoogle Scholar
  38. Jens Teubner, Gustavo Alonso, Cagri Balkesen, and M. Tamer Ozsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’13).Google ScholarGoogle Scholar
  39. S. P. VanderWiel and D. J. Lilja. 1999. A compiler-assisted data prefetch controller. In Proceedings of the IEEE International Conference on Computer Design (ICCD’99).Google ScholarGoogle Scholar
  40. Vish Viswanathan. 2014. Disclosure of H/W prefetcher control on some Intel processors. Retrieved from https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors.Google ScholarGoogle Scholar
  41. Youfeng Wu, Mauricio J. Serrano, Rakesh Krishnaiyer, Wei Li, and Jesse Fang. 2002. Value-profile guided stride prefetching for irregular code. In Proceedings of the International Conference on Compiler Construction (CC’02).Google ScholarGoogle ScholarCross RefCross Ref
  42. Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: Indirect memory prefetcher. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’15).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Computer Systems
        ACM Transactions on Computer Systems  Volume 36, Issue 3
        August 2018
        99 pages
        ISSN:0734-2071
        EISSN:1557-7333
        DOI:10.1145/3341160
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 June 2019
        • Accepted: 1 March 2019
        • Revised: 1 September 2018
        • Received: 1 December 2017
        Published in tocs Volume 36, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!