skip to main content
research-article
Public Access

Efficient Address Translation for Architectures with Multiple Page Sizes

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

Processors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of page sizes. In reality, set-associative TLBs -- used frequently for their energy efficiency compared to fully-associative TLBs -- cannot (easily) support multiple page sizes concurrently. Instead, commercial systems typically implement separate set-associative TLBs for different page sizes. This means that when superpages are allocated aggressively, TLB misses may, counter intuitively, increase even if entries for small pages remain unused (and vice-versa). We invent MIX TLBs, energy-frugal set-associative structures that concurrently support all page sizes by exploiting superpage allocation patterns. MIX TLBs boost the performance (often by 10-30%) of big-memory applications on native CPUs, virtualized CPUs, and GPUs. MIX TLBs are simple and require no OS or program changes.

References

  1. J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Talluri, S. Kong, M. Hill, and D. Patterson, "Tradeoffs in Supporting Two Page Sizes," ISCA, 1992.Google ScholarGoogle Scholar
  4. B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Large P ages and Lightweight Memory Management in Virtualized Systems: Can You Have it Both Ways?," MICRO, 2015.Google ScholarGoogle Scholar
  5. D. Fan, Z. Tang, H. Huang, and G. Gao, "An Energy Efficient TLB Design Methodology," ISLPED, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Karakostas, J. Gandhi, A. Cristal, M. Hill, K. McKinle y, M. Nemirovsky, M. Swift, and O. Unsal, "Energy-Efficient Address Translation," HPCA, 2016.Google ScholarGoogle Scholar
  7. T. Juan, T. Lang, and J. Navarro, "Reducing TLB Power Requirements," ISLPED, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, and G. Chen, "Generating Physical Addresses Directly for Saving Instruction TLB Energy," MICRO, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  9. A. Sodani, "Race to Exascale: Opportunities and Challenges," MICRO Keynote, 2011.Google ScholarGoogle Scholar
  10. M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.Google ScholarGoogle Scholar
  11. Intel, "Haswell," www.7-cpu.com/cpu/Haswell.html, 2016.Google ScholarGoogle Scholar
  12. Intel, "Skylake," www.7-cpu.com/cpu/Skylake.html, 2016.Google ScholarGoogle Scholar
  13. J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.Google ScholarGoogle Scholar
  14. J. Buell, D. Hecht, J. Heo, K. Saladi, and R. Taheri, "Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications," VMWare Technical Journal, 2013.Google ScholarGoogle Scholar
  15. A. Seznec, "Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB," IEEE Transactions on Computers, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharj ee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.Google ScholarGoogle Scholar
  17. B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Effic ient Virtual Memory for Big Memory Servers," ISCA, 2013.Google ScholarGoogle Scholar
  19. A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Pichai, L. Hsu, and A. Bhattacharjee, "Architectura l Support for Address Translation on GPUs," ASPLOS, 2014.Google ScholarGoogle Scholar
  22. B. Pichai, L. Hsu, and A. Bhattacharjee, "Address Translation for Throughput Oriented Accelerators," IEEE Micro Top Picks, 2015.Google ScholarGoogle Scholar
  23. J. Power, M. Hill, and D. Wood, "Supporting x86-64 Addre ss Translation for 100s of GPU Lanes," HPCA, 2014.Google ScholarGoogle Scholar
  24. N. Agarwal, D. Nellans, M. O'Connor, S. Keckler, and T. Wenisch, "Unlocking Bandwidth for GPUs in CC-NUMA Systems," HPCA, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. Keckler, "Page Placement Strategies for GPUs within Heterogeneous Memory Systems," ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Kyriazis, "Heterogeneous System Architecture: A Te chnical Review," Whitepaper, 2012.Google ScholarGoogle Scholar
  27. J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee, "Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems," ISPASS, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  28. T. Zheng, D. Nellans, A. Zulfiqar, M. Stephenson, and S. Keckler, "Towards a High Performance Paged Memory for GPUs," HPCA, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  29. V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Intel, "Intel 64 and IA-32 Architectures Software Deve loper's Manual," 2016.Google ScholarGoogle Scholar
  31. D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, "COATCheck: Verifying Memory Ordering at the Hardware-OS Interface," ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Romanescu, A. Lebeck, and D. Sorin, "Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency," ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," MICRO, 2007.Google ScholarGoogle Scholar
  34. A. Basu, M. Hill, and M. Swift, "Reducing Memory Reference Energy with Opportunistic Virtual Caching," ISCA, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  35. A. Seznec, "A Case for Two-Way Skewed Associative Cache," ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Bodin and A. Seznec, "Skewed Associativity Enhances Performance Predictability," ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.Google ScholarGoogle Scholar
  38. R. Sampson and T. Wenisch, "Z-Cache Skewered," WDDD, 2011.Google ScholarGoogle Scholar
  39. A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  40. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lown ey, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," PLDI, 2005.Google ScholarGoogle Scholar
  41. C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Simp lications," IISWC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisaf aee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012.Google ScholarGoogle Scholar
  43. S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, L. Wang, and K. Skadron, "A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads," IISWC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.Google ScholarGoogle Scholar
  45. A. Clements, F. Kaashoek, and N. Zeldovich, "Scalable Address Spaces Using RCU Balanced Trees," ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Bhattacharjee, "Translation-Triggered Prefetching," ASP-LOS, 2017.Google ScholarGoogle Scholar
  47. B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Using TLB Speculation to Overcome Page Splintering in Virtual Machines," Rutgers Technical Report DCS-TR-713, 2015.Google ScholarGoogle Scholar
  48. F. Guo, S. Kim, Y. Baskakov, and I. Banerjee, "Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi," VEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. Gaud, B. Lepers, J. Decouchant, J. Funston, and A. Fedorova, "Large Pages May be Harmful on NUMA Systems," USENIX ATC, 2014.Google ScholarGoogle Scholar
  50. J. Gandhi, M. Hill, and M. Swift, "Agile Paging: Exceedi ng the Best of Nested and Shadow Paging," ISCA, 2016.Google ScholarGoogle Scholar

Index Terms

  1. Efficient Address Translation for Architectures with Multiple Page Sizes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 4
        ASPLOS '17
        April 2017
        811 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093336
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2017
          856 pages
          ISBN:9781450344654
          DOI:10.1145/3037697

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!