Abstract
Processors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of page sizes. In reality, set-associative TLBs -- used frequently for their energy efficiency compared to fully-associative TLBs -- cannot (easily) support multiple page sizes concurrently. Instead, commercial systems typically implement separate set-associative TLBs for different page sizes. This means that when superpages are allocated aggressively, TLB misses may, counter intuitively, increase even if entries for small pages remain unused (and vice-versa). We invent MIX TLBs, energy-frugal set-associative structures that concurrently support all page sizes by exploiting superpage allocation patterns. MIX TLBs boost the performance (often by 10-30%) of big-memory applications on native CPUs, virtualized CPUs, and GPUs. MIX TLBs are simple and require no OS or program changes.
- J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002. Google Scholar
Cross Ref
- M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994. Google Scholar
Digital Library
- M. Talluri, S. Kong, M. Hill, and D. Patterson, "Tradeoffs in Supporting Two Page Sizes," ISCA, 1992.Google Scholar
- B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Large P ages and Lightweight Memory Management in Virtualized Systems: Can You Have it Both Ways?," MICRO, 2015.Google Scholar
- D. Fan, Z. Tang, H. Huang, and G. Gao, "An Energy Efficient TLB Design Methodology," ISLPED, 2005. Google Scholar
Digital Library
- V. Karakostas, J. Gandhi, A. Cristal, M. Hill, K. McKinle y, M. Nemirovsky, M. Swift, and O. Unsal, "Energy-Efficient Address Translation," HPCA, 2016.Google Scholar
- T. Juan, T. Lang, and J. Navarro, "Reducing TLB Power Requirements," ISLPED, 1997. Google Scholar
Digital Library
- I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, and G. Chen, "Generating Physical Addresses Directly for Saving Instruction TLB Energy," MICRO, 2002. Google Scholar
Cross Ref
- A. Sodani, "Race to Exascale: Opportunities and Challenges," MICRO Keynote, 2011.Google Scholar
- M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.Google Scholar
- Intel, "Haswell," www.7-cpu.com/cpu/Haswell.html, 2016.Google Scholar
- Intel, "Skylake," www.7-cpu.com/cpu/Skylake.html, 2016.Google Scholar
- J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.Google Scholar
- J. Buell, D. Hecht, J. Heo, K. Saladi, and R. Taheri, "Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications," VMWare Technical Journal, 2013.Google Scholar
- A. Seznec, "Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB," IEEE Transactions on Computers, 2004. Google Scholar
Digital Library
- B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharj ee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.Google Scholar
- B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014. Google Scholar
Cross Ref
- A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Effic ient Virtual Memory for Big Memory Servers," ISCA, 2013.Google Scholar
- A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013. Google Scholar
Digital Library
- R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008. Google Scholar
Digital Library
- B. Pichai, L. Hsu, and A. Bhattacharjee, "Architectura l Support for Address Translation on GPUs," ASPLOS, 2014.Google Scholar
- B. Pichai, L. Hsu, and A. Bhattacharjee, "Address Translation for Throughput Oriented Accelerators," IEEE Micro Top Picks, 2015.Google Scholar
- J. Power, M. Hill, and D. Wood, "Supporting x86-64 Addre ss Translation for 100s of GPU Lanes," HPCA, 2014.Google Scholar
- N. Agarwal, D. Nellans, M. O'Connor, S. Keckler, and T. Wenisch, "Unlocking Bandwidth for GPUs in CC-NUMA Systems," HPCA, 2015. Google Scholar
Cross Ref
- N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. Keckler, "Page Placement Strategies for GPUs within Heterogeneous Memory Systems," ASPLOS, 2015. Google Scholar
Digital Library
- G. Kyriazis, "Heterogeneous System Architecture: A Te chnical Review," Whitepaper, 2012.Google Scholar
- J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee, "Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems," ISPASS, 2016. Google Scholar
Cross Ref
- T. Zheng, D. Nellans, A. Zulfiqar, M. Stephenson, and S. Keckler, "Towards a High Performance Paged Memory for GPUs," HPCA, 2016. Google Scholar
Cross Ref
- V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015. Google Scholar
Digital Library
- Intel, "Intel 64 and IA-32 Architectures Software Deve loper's Manual," 2016.Google Scholar
- D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, "COATCheck: Verifying Memory Ordering at the Hardware-OS Interface," ASPLOS, 2016. Google Scholar
Digital Library
- B. Romanescu, A. Lebeck, and D. Sorin, "Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency," ASPLOS, 2010. Google Scholar
Digital Library
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," MICRO, 2007.Google Scholar
- A. Basu, M. Hill, and M. Swift, "Reducing Memory Reference Energy with Opportunistic Virtual Caching," ISCA, 2012. Google Scholar
Cross Ref
- A. Seznec, "A Case for Two-Way Skewed Associative Cache," ISCA, 1993. Google Scholar
Digital Library
- F. Bodin and A. Seznec, "Skewed Associativity Enhances Performance Predictability," ISCA, 1995. Google Scholar
Digital Library
- D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.Google Scholar
- R. Sampson and T. Wenisch, "Z-Cache Skewered," WDDD, 2011.Google Scholar
- A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011. Google Scholar
Cross Ref
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lown ey, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," PLDI, 2005.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Simp lications," IISWC, 2008. Google Scholar
Digital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisaf aee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012.Google Scholar
- S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, L. Wang, and K. Skadron, "A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads," IISWC, 2010. Google Scholar
Digital Library
- A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.Google Scholar
- A. Clements, F. Kaashoek, and N. Zeldovich, "Scalable Address Spaces Using RCU Balanced Trees," ASPLOS, 2012. Google Scholar
Digital Library
- A. Bhattacharjee, "Translation-Triggered Prefetching," ASP-LOS, 2017.Google Scholar
- B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Using TLB Speculation to Overcome Page Splintering in Virtual Machines," Rutgers Technical Report DCS-TR-713, 2015.Google Scholar
- F. Guo, S. Kim, Y. Baskakov, and I. Banerjee, "Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi," VEE, 2015. Google Scholar
Digital Library
- F. Gaud, B. Lepers, J. Decouchant, J. Funston, and A. Fedorova, "Large Pages May be Harmful on NUMA Systems," USENIX ATC, 2014.Google Scholar
- J. Gandhi, M. Hill, and M. Swift, "Agile Paging: Exceedi ng the Best of Nested and Shadow Paging," ISCA, 2016.Google Scholar
Index Terms
Efficient Address Translation for Architectures with Multiple Page Sizes
Recommendations
Efficient Address Translation for Architectures with Multiple Page Sizes
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsProcessors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of ...
Efficient Address Translation for Architectures with Multiple Page Sizes
Asplos'17Processors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of ...
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsHeterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...







Comments