skip to main content
research-article

Translation-Triggered Prefetching

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

We propose translation-enabled memory prefetching optimizations or TEMPO, a low-overhead hardware mechanism to boost memory performance by exploiting the operating system's (OS) virtual memory subsystem. We are the first to make the following observations: (1) a substantial fraction (20-40%) of DRAM references in modern big- data workloads are devoted to accessing page tables; and (2) when memory references require page table lookups in DRAM, the vast majority of them (98%+) also look up DRAM for the subsequent data access. TEMPO exploits these observations to enable DRAM row-buffer and on-chip cache prefetching of the data that page tables point to. TEMPO requires trivial changes to the memory controller (under 3% additional area), no OS or application changes, and improves performance by 10-30% and energy by 1-14%.

References

  1. O. Mutlu and L. Subramaniam, "Research Problems and Opportunities in Memory Systems," SUPERFRI, 2015.Google ScholarGoogle Scholar
  2. G. Cox and A. Bhattacharjee, "Efficient Address Translation with Multiple PageSizes," ASPLOS, 2017.Google ScholarGoogle Scholar
  3. R. Cooksey, S. Jourdan, and D. Grunwald, "A Stateless, Content-Directed Data Prefetching Mechanism," ASPLOS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Mutlu, "Memory Scaling: A Systems Architecture Perspective," MEMCON, 2015.Google ScholarGoogle Scholar
  5. B. Jacob, "The Memory System: You Can't Avoid It; You Can't Ignore It; You Can't Fake It," Morgan Claypool Synthesis Lectures Series, 2009.Google ScholarGoogle Scholar
  6. K. Chang, P. Nair, S. Ghose, D. Lee, M. Qureshi, and O. Mutlu, "Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM," HPCA, 2016.Google ScholarGoogle Scholar
  7. V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. Gibbons, M. Kozuch, and T. Mowry, "Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses," MICRO, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. K.-W. Chang, D. Lee, Z. Chishti, A. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu, "Improving DRAM Performance by Parallelizing Refreshes with Accesses," HPCA, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  9. V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, M. Kozuch, P. Gibbons, and T. Mowry, "RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization," MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramaniam, and O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA, 2013.Google ScholarGoogle Scholar
  11. S.-L. Lu, Ying-Chen, and C.-L. Yang, "Improving DRAM Latency with Dynamic Asymmetric Subarray," MICRO, 2015.Google ScholarGoogle Scholar
  12. Y. H. Son, O. Seongil, Y. Ro, J. Lee, and J. H. Ahn, "Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations," ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. Jouppi, "Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores," ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Kim, V. Seshadri, D. Lee, J. lee, and O. Mutlu, "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," ISCA, 2012.Google ScholarGoogle Scholar
  15. H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, and O. Mutlu, "ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality," HPCA, 2016.Google ScholarGoogle Scholar
  16. X. Shen, F. Shong, H. Meng, S. An, and Z. Zhang, "Rbpp: A Row Based DRAM Page Policy for the Manycore Era," ICPADS, 2014.Google ScholarGoogle Scholar
  17. M. Awasthi, D. Nellans, R. Balasubramonian, and A. Davis, "Prediction based DRAM Row-Buffer Management in the Many-Core Era," PACT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Dwarkanath, Gulur, M. Mehendale, R. Manikantan, and R. Govindarajan, "Multiple Sub-Row Buffers in DRAM: Unlocking Performance and Energy Improvement Opportunities," ICS, 2012.Google ScholarGoogle Scholar
  19. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter, "Atlas: A scalable and high-performance scheduling algorithm for multiple memory constrollers," HPCA, 2010.Google ScholarGoogle Scholar
  20. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," ISCA, 2008.Google ScholarGoogle Scholar
  21. K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith, "Fair Queueing Memory Systems," MICRO, 2006.Google ScholarGoogle Scholar
  22. D. Abts, N. Enright-Jerger, J. Kim, D. Gibson, and M. Lipasti, "Achieving Predictable Performance Through Better Memory Controller Placement in Many-Core CMPs," ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu, "BLISS: Balancing Performance, Fairness, and Complexity in Memory Access Scheduling," TPDS, 2016.Google ScholarGoogle Scholar
  24. L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu, "The Blacklisting Memory Scheduler: Achieving High Performance and Fairness at Low Cost," ICCD, 2014.Google ScholarGoogle Scholar
  25. K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis, "Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement," ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Huang, P. Pillai, and K. Shin, "Design and Implementation of Power-Aware Virtual Memory," USENIX ATC, 2003.Google ScholarGoogle Scholar
  27. L. Peeled, S. Mannor, U. Weiser, and Y. Etsion, "Semantic Locality and Context-based Prefetching Using Reinforcement Learning," ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. Pugsley, and Z. Chisti, "Efficiently Prefetching Complex Address Patterns," MICRO, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Fuchs, S. Mannor, U. Weiser, and Y. Etsion, "Loop-Aware Memory Prefetching Using Code Block Working Sets," MICRO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Barr, A. Cox, and S. Rixner, "SpecTLB: A Mechanism for Speculative Address Translation," ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  32. D. Lustig, A. Bhattacharjee, and M. Martonosi, "TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs," TACO, 2012.Google ScholarGoogle Scholar
  33. A. Bhattacharjee and M. Martonosi, "Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors," ASPLOS, 2010.Google ScholarGoogle Scholar
  34. B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.Google ScholarGoogle Scholar
  35. B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  36. B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Large Pages and Lightweight Memory Management in Virtualized Systems: Can You Have it Both Ways?," MICRO, 2015.Google ScholarGoogle Scholar
  37. V. Karakostas, J. Gandhi, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Energy-Efficient Address Translation," HPCA, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  38. V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Efficient Virtual Memory for Big Memory Servers," ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.Google ScholarGoogle Scholar
  41. M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.Google ScholarGoogle Scholar
  42. A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.Google ScholarGoogle Scholar
  43. S. Rixner, W. Dally, U. Kapasi, P. Mattson, and J. Owens, "Memory Access Scheduling," ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Yu, C. Hughes, N. Satish, and S. Devadas, "IMP: Indirect Memory Prefetcher," MICRO, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Barr, A. Cox, and S. Rixner, "Translation Caching: Skip, Don't Walk (the Page Table)," ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Clements, F. Kaashoek, and N. Zeldovich, "RadixVM: Scalable Address Spaces for Multithreaded Applications," Eurosys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Clements, F. Kaashoek, and N. Zeldovich, "Scalable Address Spaces Using RCU Balanced Trees," ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, "COATCheck: Verifying Memory Ordering at the Hardware-OS Interface," ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Basu, M. Hill, and M. Swift, "Reducing Memory Reference Energy with Opportunistic Virtual Caching," ISCA, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  52. Intel, "Haswell microarchitecture," www.7-cpu.com/cpu/Haswell.html.Google ScholarGoogle Scholar
  53. Intel, "Skylake microarchitecture," www.7-cpu.com/cpu/Skylake.html.Google ScholarGoogle Scholar
  54. H. Yoon, J. Meza, R. Ausavarungnirun, R. Harding, and O. Mutlu, "Row Buffer Locality Aware Caching Policies for Hybrid Memories," ICCD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee, "Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems," ISPASS, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  56. D. Nelson, A. Pillepich, S. Genel, M. Vogelsberger, V. Springel, P. Torrey, V. Rodriguez-Gomez, D. Sijacki, G. Snyder, B. Griffen, F. Marinacci, L. Blecha, L. Sales, D. Xu, and L. Hernquist, "The Illustris Simulation: Public Data Release," Arxiv, 2015.Google ScholarGoogle Scholar
  57. Q. Deng, D. Meisner, L. Ramos, T. Wenisch, and R. Bianchini, "MemScale: Active Low-Power Modes for Main Memory," ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Q. Deng, D. Meisner, A. Bhattacharjee, T. Wenisch, and R. Bianchini, "CoScale: Coordinatd CPU and Memory System DVFS in Server Systems," MICRO, 2012.Google ScholarGoogle Scholar
  59. J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Translation-Triggered Prefetching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 4
        ASPLOS '17
        April 2017
        811 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093336
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2017
          856 pages
          ISBN:9781450344654
          DOI:10.1145/3037697

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!