skip to main content
research-article

Performance Implications of Extended Page Tables on Virtualized x86 Processors

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Managing virtual memory is an expensive operation, and becomes even more expensive on virtualized servers. Process- ing TLB misses on a virtualized x86 server requires a two-dimensional page walk that can have 6x more page table lookups, hence 6x more memory references, than a native page table walk. Thus much of the recent research on the subject starts from the assumption that TLB miss processing in virtual environments is significantly more expensive than on native servers. However, we will show that with the latest software stack on modern x86 processors, most of these page-table lookups are satisfied by internal paging structure caches and the L1/L2 data caches, and the actual virtualization overhead of TLB miss processing is a modest fraction of the overall time spent processing TLB misses.

In this paper, we present a detailed accounting of the TLB miss processing costs on virtualized x86 servers for an exhaustive set of workloads, in particular, two very demanding industry standard workloads. We show that an implementation of the TPC-C workload that actively uses 475 GB of memory on a 72-CPU Haswell-EP server spends 20% of its time processing TLB misses when the application runs in a VM. Although this is a non-trivial amount, it is only 4.2% higher than the TLB miss processing costs on bare metal. The multi-VM VMmark benchmark sees 12.3% in TLB miss processing, but only 4.3% of that can be attributed to virtualization overheads. We show that even for the heaviest workloads, a well-tuned application that uses large pages on a recent OS release with a modern hypervisor running on the latest x86 processors sees only minimal degradation from the additional overhead of the two-dimensional page walks in a virtualized server.

References

  1. K. Adams and O. Agesen, "A comparison of software and hardware techniques for x86 virtualization," in Proceedings of the 12th international conference on Architectural sup- port for programming languages and operating systems (ASPLOS), 2006.Google ScholarGoogle Scholar
  2. T. Barr, A. Cox, and S. Rixner, Translation Caching: Skip, Don't Walk the Page Table, in Proceedings of the 37th annual international symposium on computer architecture(ISCA), 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ----, SpecTLB: A Mechanism for Speculative Address Translation, in Proceedings of the 38th annual international symposium on computer architecture (ISCA), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, Efficient Virtual Memory for Big Memory Servers, in Proceedings of the 39th annual international symposium on computer architecture (ISCA), 2012.Google ScholarGoogle Scholar
  5. R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, Accelerating two-dimensional page walks for virtualized systems, in Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS), 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li, The PARSEC benchmark suite: characterization and architectural implications, in Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT) 2008, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Buell, D. Hecht, J. Heo, K. Saladi, and H. R. Taheri, Methodology for Performance Analysis of VMware vSere under Tier-1 Applications, in VMware Technical Journal, 2013.Google ScholarGoogle Scholar
  8. X. Chang, H. Franke, Y. Ge, T. Liu, K. Wang, J. Xenidis, F. Chen, and Y. Zhang, Improving Virtualization in the Presence of Software Managed Translation Lookaside Buffers, in Proceedings of the 40th annual international symposium on computer architecture (ISCA), 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Gandhi, A. Basu, M. Hill, and M. Swift, Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks, in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. L. Henning and SPEC, "benchmark descriptions, in ACM SIGARCH Computer Architecture News," vol. 34, Sep. 2006.Google ScholarGoogle Scholar
  11. J. Huck and J. Hays, Architectural support for translation table management in large address space machines, in Proceedings of the 20th annual international symposium on computer architecture (ISCA), 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel, Intel 64 and IA-32 Architectures Optimization Reference Manual, 2015.Google ScholarGoogle Scholar
  13. ----, Intel 64 and IA-32 Architectures Software Developer's Manual, 2015.Google ScholarGoogle Scholar
  14. B. Jacob and T. Mudge, Uniprocessor virtual memory without TLBs, in IEEE Transactions on Computers (Volume:50, Issue: 5 ), May 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, Redundant Memory Mappings for Fast Access to Large Memories, in Proceedings of the 45thth annual international symposium on computer architecture (ISCA), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. \BIBentryALTinterwordspacingC.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI'05. New York, NY, USA: ACM, 2005, pp. 190--200. [Online]. Available: http://doi.acm.org/10.1145/1065010.1065034\Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Navarr, S. Iyer, P. Druschel, and A. Cox, Practical, transparent operating system support for superpages, Proceedings of the 5th symposium on Operating systems design and implementation (OSDI) 2012, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. am, J. Vesely, G. H. Loh, and A. Bhattacharjee, Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have it Both Ways?, in Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-48), 2015.Google ScholarGoogle Scholar
  19. ----, Using TLB Speculation to Overcome Page Splintering in Virtual Machines, in Rutgers University Technical Report DCS-TR-713, Mar. 2015.Google ScholarGoogle Scholar
  20. T. H. Romer, W. H. Ohlrich, A. R. Karlin, and B. N. Bershad, Reducing TLB and Memory Overhead Using Online Superpage Promotion, in Proceedings of the 22th annual international symposium on computer architecture (ISCA), 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. T.-C. D. TPC, http://www.tpc.org/tpcc/detail.asp.Google ScholarGoogle Scholar
  22. VMware, Understanding Full Virtualization, Paravirtualization, and Hardware Assist. [Online]. Available: https://www.vmware.com/files/pdf/VMware\_paravirtualization.pdf\Google ScholarGoogle Scholar
  23. ----, VMmark Benchmark 2. [Online]. Available: http://www.vmware.com/products/vmmarkGoogle ScholarGoogle Scholar
  24. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The SPLASH-2 programs: characterization and methodological considerations, 1995.Google ScholarGoogle Scholar

Index Terms

  1. Performance Implications of Extended Page Tables on Virtualized x86 Processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!