skip to main content
10.1145/1346281.1346286acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Accelerating two-dimensional page walks for virtualized systems

Published:01 March 2008Publication History

ABSTRACT

Nested paging is a hardware solution for alleviating the software memory management overhead imposed by system virtualization. Nested paging complements existing page walk hardware to form a two-dimensional (2D) page walk, which reduces the need for hypervisor intervention in guest page table management. However, the extra dimension also increases the maximum number of architecturally-required page table references.

This paper presents an in-depth examination of the 2D page table walk overhead and options for decreasing it. These options include using the AMD Opteron processor's page walk cache to exploit the strong reuse of page entry references. For a mix of server and SPEC benchmarks, the presented results show a 15%-38% improvement in guest performance by extending the existing page walk cache to also store the nested dimension of the 2D page walk. Caching nested page table translations and skipping multiple page entry references produce an additional 3%-7% improvement.

Much of the remaining 2D page walk overhead is due to low-locality nested page entry references, which result in additional memory hierarchy misses. By using large pages, the hypervisor can eliminate many of these long-latency accesses and further improve the guest performance by 3%-22%.

Skip Supplemental Material Section

Supplemental Material

Video

References

  1. A. Saulsbury et al. Recency based TLB preloading. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AMD I/O Virtualization Technology (IOMMU) Specification, February 2007.Google ScholarGoogle Scholar
  4. AMD Programmer's Manual, Volume 2, September 2007.Google ScholarGoogle Scholar
  5. D. Chang et al. Microarchitecture of HAL's memory management unit. In COMPCON '95: Technologies for the Information Superhighway, Digest of Papers, pages 272--279, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Perelman et al. Using SimPoint for accurate and efficient simulation. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Neiger et al. Intel virtualization technology: Hardware support for efficient processor virtualization. Intel Technology Journal, 10(3):167--782, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. H. Gum. System/370 extended architecture: facilities for virtual machines. IBM Journal of Research and Development, Nov. 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. TLBs, paging structure caches, and their invalidation. Intel Application Note, 317080-001, April 2007.Google ScholarGoogle Scholar
  10. B.L. Jacob and T.N. Mudge. A look at several memory management units, TLB refill mechanisms, and page table organizations. Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G.B. Kandiraju and A. Sivasubramaniam. Going the distance for TLB prefetching: An application-driven study. Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 195--206, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Liedtke. Address space sparsity and fine granularity. 6th ACM SIGOPS European Workshop: Matching Operating Systems to Application Needs, pages 78--81, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Talluri et al. A new page table for 64-bit address spaces. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SIGOPS), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G.J. Popek and R.P. Goldberg. Formal requirements for virtualizable third generation architectures. Commun. ACM, 17(7):412--421, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Sander. Processor optimizations for system-level performance. Presentation at Microprocessor Fall Forum, 2006.Google ScholarGoogle Scholar
  16. J. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Software Optimization Guide for AMD Family 10h Processors, September 2007.Google ScholarGoogle Scholar
  18. V. Makhija et al. VMmark: A scalable benchmark for virtualized systems. Technical report, VMWare, 2006.Google ScholarGoogle Scholar
  19. T.-Y. Yeh. Low-power, high-performance architecture of the PWRficient processor family. Hot Chips 18, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Accelerating two-dimensional page walks for virtualized systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!