ABSTRACT
Nested paging is a hardware solution for alleviating the software memory management overhead imposed by system virtualization. Nested paging complements existing page walk hardware to form a two-dimensional (2D) page walk, which reduces the need for hypervisor intervention in guest page table management. However, the extra dimension also increases the maximum number of architecturally-required page table references.
This paper presents an in-depth examination of the 2D page table walk overhead and options for decreasing it. These options include using the AMD Opteron processor's page walk cache to exploit the strong reuse of page entry references. For a mix of server and SPEC benchmarks, the presented results show a 15%-38% improvement in guest performance by extending the existing page walk cache to also store the nested dimension of the 2D page walk. Caching nested page table translations and skipping multiple page entry references produce an additional 3%-7% improvement.
Much of the remaining 2D page walk overhead is due to low-locality nested page entry references, which result in additional memory hierarchy misses. By using large pages, the hypervisor can eliminate many of these long-latency accesses and further improve the guest performance by 3%-22%.
Supplemental Material
Available for Download
Supplemental material for Accelerating two-dimensional page walks for virtualized systems
- A. Saulsbury et al. Recency based TLB preloading. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), 2000. Google Scholar
Digital Library
- K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006. Google Scholar
Digital Library
- AMD I/O Virtualization Technology (IOMMU) Specification, February 2007.Google Scholar
- AMD Programmer's Manual, Volume 2, September 2007.Google Scholar
- D. Chang et al. Microarchitecture of HAL's memory management unit. In COMPCON '95: Technologies for the Information Superhighway, Digest of Papers, pages 272--279, 1995. Google Scholar
Digital Library
- E. Perelman et al. Using SimPoint for accurate and efficient simulation. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2003. Google Scholar
Digital Library
- G. Neiger et al. Intel virtualization technology: Hardware support for efficient processor virtualization. Intel Technology Journal, 10(3):167--782, 2006.Google Scholar
Cross Ref
- P. H. Gum. System/370 extended architecture: facilities for virtual machines. IBM Journal of Research and Development, Nov. 1983.Google Scholar
Digital Library
- TLBs, paging structure caches, and their invalidation. Intel Application Note, 317080-001, April 2007.Google Scholar
- B.L. Jacob and T.N. Mudge. A look at several memory management units, TLB refill mechanisms, and page table organizations. Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1998. Google Scholar
Digital Library
- G.B. Kandiraju and A. Sivasubramaniam. Going the distance for TLB prefetching: An application-driven study. Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 195--206, 2002. Google Scholar
Digital Library
- J. Liedtke. Address space sparsity and fine granularity. 6th ACM SIGOPS European Workshop: Matching Operating Systems to Application Needs, pages 78--81, 1994. Google Scholar
Digital Library
- M. Talluri et al. A new page table for 64-bit address spaces. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SIGOPS), 1996. Google Scholar
Digital Library
- G.J. Popek and R.P. Goldberg. Formal requirements for virtualizable third generation architectures. Commun. ACM, 17(7):412--421, 1974. Google Scholar
Digital Library
- B. Sander. Processor optimizations for system-level performance. Presentation at Microprocessor Fall Forum, 2006.Google Scholar
- J. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Google Scholar
Digital Library
- Software Optimization Guide for AMD Family 10h Processors, September 2007.Google Scholar
- V. Makhija et al. VMmark: A scalable benchmark for virtualized systems. Technical report, VMWare, 2006.Google Scholar
- T.-Y. Yeh. Low-power, high-performance architecture of the PWRficient processor family. Hot Chips 18, 2006.Google Scholar
Index Terms
Accelerating two-dimensional page walks for virtualized systems
Recommendations
Accelerating two-dimensional page walks for virtualized systems
ASPLOS '08Nested paging is a hardware solution for alleviating the software memory management overhead imposed by system virtualization. Nested paging complements existing page walk hardware to form a two-dimensional (2D) page walk, which reduces the need for ...
Accelerating two-dimensional page walks for virtualized systems
ASPLOS '08Nested paging is a hardware solution for alleviating the software memory management overhead imposed by system virtualization. Nested paging complements existing page walk hardware to form a two-dimensional (2D) page walk, which reduces the need for ...
Accelerating two-dimensional page walks for virtualized systems
ASPLOS '08Nested paging is a hardware solution for alleviating the software memory management overhead imposed by system virtualization. Nested paging complements existing page walk hardware to form a two-dimensional (2D) page walk, which reduces the need for ...









Comments