Abstract
Virtualization technologies has been widely adopted by large-scale cloud computing platforms. These virtualized systems employ distributed resource management (DRM) to achieve high resource utilization and energy savings by dynamically migrating and consolidating virtual machines. DRM schemes usually use operating-system-level metrics, such as CPU utilization, memory capacity demand and I/O utilization, to detect and balance resource contention. However, they are oblivious to microarchitecture-level resource interference (e.g., memory bandwidth contention between different VMs running on a host), which is currently not exposed to the operating system.
We observe that the lack of visibility into microarchitecture-level resource interference significantly impacts the performance of virtualized systems. Motivated by this observation, we propose a novel architecture-aware DRM scheme (ADRM), that takes into account microarchitecture-level resource interference when making migration decisions in a virtualized cluster. ADRM makes use of three core techniques: 1) a profiler to monitor the microarchitecture-level resource usage behavior online for each physical host, 2) a memory bandwidth interference model to assess the interference degree among virtual machines on a host, and 3) a cost-benefit analysis to determine a candidate virtual machine and a host for migration.
Real system experiments on thirty randomly selected combinations of applications from the CPU2006, PARSEC, STREAM, NAS Parallel Benchmark suites in a four-host virtualized cluster show that ADRM can improve performance by up to 26.55%, with an average of 9.67%, compared to traditional DRM schemes that lack visibility into microarchitecture-level resource utilization and contention.
- Windows Azure. http://www.windowsazure.com/en-un/.Google Scholar
- Amazon EC2. http://aws.amazon.com/ec2/.Google Scholar
- libvirt: The virtualization API. http://libvirt.org.Google Scholar
- NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.Google Scholar
- QEMU. http://qemu.org.Google Scholar
- SPEC CPU2006. http://www.spec.org/spec2006.Google Scholar
- STREAM Benchmark. http://www.streambench.org/.Google Scholar
- J. Ahn, C. Kim, J. Han, Y.-R. Choi, and J. Huh. Dynamic virtual machine scheduling in clouds for architectural shared resources. In HotCloud, 2012. Google Scholar
Digital Library
- M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In PACT, 2010. Google Scholar
Digital Library
- N. Beckmann, P.-A. Tsai, and D. Sanchez. Scaling dis- tributed cache hierarchies through computation and data co- scheduling. In HPCA, 2015.Google Scholar
Cross Ref
- C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google Scholar
Digital Library
- S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for NUMA-aware contention management on multicore systems. In USENIX ATC, 2011. Google Scholar
Digital Library
- K. K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: heterogeneous adaptive throttling for on-chip networks. In SBAC-PAD, 2012. Google Scholar
Digital Library
- S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006. Google Scholar
Digital Library
- C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In NSDI, 2005. Google Scholar
Digital Library
- R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application- aware prioritization mechanisms for on-chip networks. In MICRO, 2009. Google Scholar
Digital Library
- R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: exploiting packet latency slack in on-chip networks. In ISCA, 2010. Google Scholar
Digital Library
- R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi. Application-to-core mapping policies to reduce memory system interference in multi-core systems. In HPCA, 2013. Google Scholar
Digital Library
- M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS, 2013. Google Scholar
Digital Library
- E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS, 2010. Google Scholar
Digital Library
- D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In PACT, 2012. Google Scholar
Digital Library
- S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, (3), 2008. Google Scholar
Digital Library
- D. Gmach, J. Rolia, L. Cherkasova, G. Belrose, T. Turicchi, and A. Kemper. An integrated approach to resource pool management: Policies, efficiency and quality metrics. In DSN, 2008.Google Scholar
Cross Ref
- S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC, 2011. Google Scholar
Digital Library
- B. Grot, S. W. Keckler, and O. Mutlu. Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip. In MICRO, 2009. Google Scholar
Digital Library
- B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In ISCA, 2011. Google Scholar
Digital Library
- A. Gulati, I. Ahmad, and C. A. Waldspurger. PARDA: Proportional allocation of resources for distributed storage access. In FAST, 2009. Google Scholar
Digital Library
- A. Gulati, C. Kumar, I. Ahmad, and K. Kumar. BASIL: Automated IO load balancing across storage devices. In FAST, 2010. Google Scholar
Digital Library
- A. Gulati, A. Merchant, and P. J. Varman. mClock: Handling throughput variability for hypervisor IO scheduling. In OSDI, 2010. Google Scholar
Digital Library
- A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: Online storage performance management in virtualized datacenters. In SoCC, 2011. Google Scholar
Digital Library
- A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, and X. Zhu. VMware distributed resource management: Design, implementation, and lessons learned. VMware Technical Journal, 1(1):45--64, 2012.Google Scholar
- Intel. Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors.Google Scholar
- Intel. An Introduction to the Intel QuickPath Interconnect, 2009.Google Scholar
- C. Isci, J. Hanson, I. Whalley, M. Steinder, and J. Kephart. Runtime demand estimation for effective dynamic resource management. In NOMS, 2010.Google Scholar
- C. Isci, J. Liu, B. Abali, J. Kephart, and J. Kouloheris. Improving server utilization using fast virtual machine migration. IBM Journal of Research and Development, 55 (6), Nov 2011. Google Scholar
Digital Library
- R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004. Google Scholar
Digital Library
- M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In SC, 2012. Google Scholar
Digital Library
- O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. Managing GPU concurrency in heterogeneous architectures. In MICRO, 2014. Google Scholar
Digital Library
- H. Kim, D. de Niz, B. Andersson, M. H. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in cots-based multi-core systems. In RTAS, 2014.Google Scholar
Cross Ref
- S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google Scholar
Digital Library
- Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.Google Scholar
- Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010. Google Scholar
Digital Library
- A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. kvm: the Linux Virtual Machine Monitor. In Proceedings of the Linux Symposium, volume 1, 2007.Google Scholar
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.Google Scholar
- M. Liu and T. Li. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads. In ISCA, 2014. Google Scholar
Digital Library
- J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO, 2011. Google Scholar
Digital Library
- T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google Scholar
Digital Library
- T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008. Google Scholar
Digital Library
- S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In MICRO, 2011. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008. Google Scholar
Digital Library
- R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: Managing performance interference effects for QoS-aware clouds. In EuroSys, 2010. Google Scholar
Digital Library
- M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In USENIX ATC, 2005. Google Scholar
Digital Library
- G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next generation on-chip networks: What kind of congestion control do we need? In HotNets, 2010. Google Scholar
Digital Library
- G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. In SIGCOMM, 2012. Google Scholar
Digital Library
- P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, and A. Merchant. Automated control of multiple virtualized resources. In EuroSys, 2009. Google Scholar
Digital Library
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006. Google Scholar
Digital Library
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007. Google Scholar
Digital Library
- J. Rao and X. Zhou. Towards fair and efficient SMP virtual machine scheduling. In PPoPP, 2014. Google Scholar
Digital Library
- J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing virtual machine scheduling in NUMA multicore systems. In HPCA, 2013. Google Scholar
Digital Library
- V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In PACT, 2012. Google Scholar
Digital Library
- A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000. Google Scholar
Digital Library
- L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. MISE: Providing performance predictability and improving fairness in shared main memory systems. In HPCA, 2013. Google Scholar
Digital Library
- L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu. The blacklisting memory scheduler: Achieving high performance and fairness at low cost. In ICCD, 2014.Google Scholar
Cross Ref
- G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. Journal of Supercomputing, 28(1), 2004. Google Scholar
Digital Library
- L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011. Google Scholar
Digital Library
- L. Tang, J. Mars, and M. L. Soffa. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In CGO, 2012. Google Scholar
Digital Library
- A. Tumanov, J. Wise, O. Mutlu, and G. R. Ganger. Asymmetry-aware execution placement on manycore chips. In SFMA, 2013.Google Scholar
- H. Vandierendonck and A. Seznec. Fairness metrics for multi-threaded processors. IEEE CAL, February 2011. Google Scholar
Digital Library
- C. A. Waldspurger. Memory resource management in VMware ESX server. In OSDI, 2002. Google Scholar
Digital Library
- C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In HPDC, 2011. Google Scholar
Digital Library
- T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black- box and gray-box strategies for virtual machine migration. In NSDI, 2007. Google Scholar
Digital Library
- Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo- partitioning of multi-core shared caches. In ISCA, 2009. Google Scholar
Digital Library
- H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ISCA, 2013. Google Scholar
Digital Library
- K. Ye, Z. Wu, C. Wang, B. Zhou, W. Si, X. Jiang, and A. Zomaya. Profiling-based workload consolidation and migration in virtualized data centres. TPDS, 2014.Google Scholar
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010 Google Scholar
Digital Library
Index Terms
A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters
Recommendations
A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsVirtualization technologies has been widely adopted by large-scale cloud computing platforms. These virtualized systems employ distributed resource management (DRM) to achieve high resource utilization and energy savings by dynamically migrating and ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsSingle-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...







Comments