skip to main content
research-article

Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Virtual Machine based approaches to workload consolidation, as seen in IaaS cloud as well as datacenter platforms, have long had to contend with performance degradation caused by synchronization primitives inside the guest environments. These primitives can be affected by virtual CPU preemptions by the host scheduler that can introduce delays that are orders of magnitude longer than those primitives were designed for. While a significant amount of work has focused on the behavior of spinlock primitives as a source of these performance issues, spinlocks do not represent the entirety of synchronization mechanisms that are susceptible to scheduling issues when running in a virtualized environment. In this paper we address the virtualized performance issues introduced by TLB shootdown operations. Our profiling study, based on the PARSEC benchmark suite, has shown that up to 64% of a VM's CPU time can be spent on TLB shootdown operations under certain workloads. In order to address this problem, we present a paravirtual TLB shootdown scheme named Shoot4U. Shoot4U completely eliminates TLB shootdown preemptions by invalidating guest TLB entries from the VMM and allowing guest TLB shootdown operations to complete without waiting for remote virtual CPUs to be scheduled. Our performance evaluation using the PARSEC benchmark suite demonstrates that Shoot4U can reduce benchmark runtime by up to 85% compared an unmodified Linux kernel, and up to 44% over a state-of-the-art paravirtual TLB shootdown scheme.

References

  1. Linux Control Groups (cgroups). https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt.Google ScholarGoogle Scholar
  2. Gartner Says Efficient Data Center Design Can Lead to 300 Percent Capacity Growth in 60 Percent Less Space. http://www.gartner.com/newsroom/id/1472714.Google ScholarGoogle Scholar
  3. ktap: A lightweight script-based dynamic tracing tool for Linux. http://www.ktap.org/.Google ScholarGoogle Scholar
  4. KVM Paravirt Remote Flush TLB. https://lwn.net/Articles/500188/.Google ScholarGoogle Scholar
  5. The PARSEC Benchmark Suite. http://parsec.cs.princeton.edu/.Google ScholarGoogle Scholar
  6. perf: Linux Profiling with Performance Counters. https://perf.wiki.kernel.org/.Google ScholarGoogle Scholar
  7. Sysbench. https://github.com/akopytov/sysbench.Google ScholarGoogle Scholar
  8. Vmware(r) vsere(tm): The cpu scheduler in vmware esx(r) 4.1. Technical report, VMware, Inc, 2010.Google ScholarGoogle Scholar
  9. le]Barroso13L. A. Barroso, J. Clidaras, and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Synthesis Lectures on Computer Architecture, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Ding, P. B. Gibbons, M. A. Kozuch, and J. Shan. Gleaner: Mitigating the Blocked-Waiter Wakeup Problem for Virtualized Multicore Applications. In Proc. 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC), iladelia, PA, June 2014. USENIX Association. URL https://www.usenix.org/conference/atc14/technical-sessions/presentation/ding.Google ScholarGoogle Scholar
  11. T. Friebel. How to Deal with Lock-Holder Preemption. Presented at the Xen Summit North America, 2008.Google ScholarGoogle Scholar
  12. J. Kaplan, W. Forrest, and N. Kindler. Revolutionizing Data Center Energy Efficiency. Technical report, McKinsey & Company, 2008.Google ScholarGoogle Scholar
  13. H. Kim, S. Kim, J. Jeong, J. Lee, and S. Maeng. Demand-based Coordinated Scheduling for SMP VMs. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis. Heracles: Improving Resource Efficiency at Scale. In Proc. of the 42nd Annual International Symposium on Computer Architecture (ISCA), ISCA '15, 2015. 10.1145/2749469.2749475. URL http://doi.acm.org/10.1145/2749469.2749475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proc. 3rd International Conference on Distributed Computing Systems, 1982.Google ScholarGoogle Scholar
  16. J. Ouyang and J. R. Lange. Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud. In Proc. 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Raghavendra and J. Fitzhardinge. Paravirtualized ticket spinlocks, May 2012. URL http://lwn.net/Articles/495597/.Google ScholarGoogle Scholar
  18. C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proc. 3rd ACM Symposium on Cloud Computing (SoCC), 2012. ISBN 978--1--4503--1761-0. 10.1145/2391229.2391236. URL http://doi.acm.org/10.1145/2391229.2391236.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. v. Riel. Directed yield for pause loop exiting, 2011. URL http://lwn.net/Articles/424960/.Google ScholarGoogle Scholar
  20. O. Sukwong and H. S. Kim. Is Co-scheduling Too Expensive for SMP VMs? In Proc. 6th European Conference on Computer Systems (EuroSys), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards Scalable Multiprocessor Virtual Machines. In Proc. 3rd conference on Virtual Machine Research And Technology Symposium, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic Adaptive Scheduling for Virtual Machines. In Proc. 20th International Symposium on High Performance Parallel and Distributed Computing (HPDC), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Zhang, Y. Chen, Y. Dong, and C. Liu. Lock-Visor: An Efficient Transitory Co-scheduling for MP Guest. In Proc. 41st International Conference on Parallel Processing (ICPP), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 7
      VEE '16
      July 2016
      167 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3007611
      Issue’s Table of Contents
      • cover image ACM Conferences
        VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
        March 2016
        186 pages
        ISBN:9781450339476
        DOI:10.1145/2892242

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 March 2016

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!