skip to main content
research-article

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

Authors Info & Claims
Published:14 March 2015Publication History
Skip Abstract Section

Abstract

Virtualization technologies has been widely adopted by large-scale cloud computing platforms. These virtualized systems employ distributed resource management (DRM) to achieve high resource utilization and energy savings by dynamically migrating and consolidating virtual machines. DRM schemes usually use operating-system-level metrics, such as CPU utilization, memory capacity demand and I/O utilization, to detect and balance resource contention. However, they are oblivious to microarchitecture-level resource interference (e.g., memory bandwidth contention between different VMs running on a host), which is currently not exposed to the operating system.

We observe that the lack of visibility into microarchitecture-level resource interference significantly impacts the performance of virtualized systems. Motivated by this observation, we propose a novel architecture-aware DRM scheme (ADRM), that takes into account microarchitecture-level resource interference when making migration decisions in a virtualized cluster. ADRM makes use of three core techniques: 1) a profiler to monitor the microarchitecture-level resource usage behavior online for each physical host, 2) a memory bandwidth interference model to assess the interference degree among virtual machines on a host, and 3) a cost-benefit analysis to determine a candidate virtual machine and a host for migration.

Real system experiments on thirty randomly selected combinations of applications from the CPU2006, PARSEC, STREAM, NAS Parallel Benchmark suites in a four-host virtualized cluster show that ADRM can improve performance by up to 26.55%, with an average of 9.67%, compared to traditional DRM schemes that lack visibility into microarchitecture-level resource utilization and contention.

References

  1. Windows Azure. http://www.windowsazure.com/en-un/.Google ScholarGoogle Scholar
  2. Amazon EC2. http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  3. libvirt: The virtualization API. http://libvirt.org.Google ScholarGoogle Scholar
  4. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.Google ScholarGoogle Scholar
  5. QEMU. http://qemu.org.Google ScholarGoogle Scholar
  6. SPEC CPU2006. http://www.spec.org/spec2006.Google ScholarGoogle Scholar
  7. STREAM Benchmark. http://www.streambench.org/.Google ScholarGoogle Scholar
  8. J. Ahn, C. Kim, J. Han, Y.-R. Choi, and J. Huh. Dynamic virtual machine scheduling in clouds for architectural shared resources. In HotCloud, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In PACT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Beckmann, P.-A. Tsai, and D. Sanchez. Scaling dis- tributed cache hierarchies through computation and data co- scheduling. In HPCA, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  11. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for NUMA-aware contention management on multicore systems. In USENIX ATC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: heterogeneous adaptive throttling for on-chip networks. In SBAC-PAD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In NSDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application- aware prioritization mechanisms for on-chip networks. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: exploiting packet latency slack in on-chip networks. In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi. Application-to-core mapping policies to reduce memory system interference in multi-core systems. In HPCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, (3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Gmach, J. Rolia, L. Cherkasova, G. Belrose, T. Turicchi, and A. Kemper. An integrated approach to resource pool management: Policies, efficiency and quality metrics. In DSN, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Grot, S. W. Keckler, and O. Mutlu. Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Gulati, I. Ahmad, and C. A. Waldspurger. PARDA: Proportional allocation of resources for distributed storage access. In FAST, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Gulati, C. Kumar, I. Ahmad, and K. Kumar. BASIL: Automated IO load balancing across storage devices. In FAST, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Gulati, A. Merchant, and P. J. Varman. mClock: Handling throughput variability for hypervisor IO scheduling. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: Online storage performance management in virtualized datacenters. In SoCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, and X. Zhu. VMware distributed resource management: Design, implementation, and lessons learned. VMware Technical Journal, 1(1):45--64, 2012.Google ScholarGoogle Scholar
  32. Intel. Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors.Google ScholarGoogle Scholar
  33. Intel. An Introduction to the Intel QuickPath Interconnect, 2009.Google ScholarGoogle Scholar
  34. C. Isci, J. Hanson, I. Whalley, M. Steinder, and J. Kephart. Runtime demand estimation for effective dynamic resource management. In NOMS, 2010.Google ScholarGoogle Scholar
  35. C. Isci, J. Liu, B. Abali, J. Kephart, and J. Kouloheris. Improving server utilization using fast virtual machine migration. IBM Journal of Research and Development, 55 (6), Nov 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In SC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. Managing GPU concurrency in heterogeneous architectures. In MICRO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Kim, D. de Niz, B. Andersson, M. H. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in cots-based multi-core systems. In RTAS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.Google ScholarGoogle Scholar
  42. Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. kvm: the Linux Virtual Machine Monitor. In Proceedings of the Linux Symposium, volume 1, 2007.Google ScholarGoogle Scholar
  44. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.Google ScholarGoogle Scholar
  45. M. Liu and T. Li. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads. In ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: Managing performance interference effects for QoS-aware clouds. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In USENIX ATC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next generation on-chip networks: What kind of congestion control do we need? In HotNets, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, and A. Merchant. Automated control of multiple virtualized resources. In EuroSys, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. J. Rao and X. Zhou. Towards fair and efficient SMP virtual machine scheduling. In PPoPP, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing virtual machine scheduling in NUMA multicore systems. In HPCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. MISE: Providing performance predictability and improving fairness in shared main memory systems. In HPCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and O. Mutlu. The blacklisting memory scheduler: Achieving high performance and fairness at low cost. In ICCD, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  65. G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. Journal of Supercomputing, 28(1), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. L. Tang, J. Mars, and M. L. Soffa. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In CGO, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. A. Tumanov, J. Wise, O. Mutlu, and G. R. Ganger. Asymmetry-aware execution placement on manycore chips. In SFMA, 2013.Google ScholarGoogle Scholar
  69. H. Vandierendonck and A. Seznec. Fairness metrics for multi-threaded processors. IEEE CAL, February 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. C. A. Waldspurger. Memory resource management in VMware ESX server. In OSDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In HPDC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black- box and gray-box strategies for virtual machine migration. In NSDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo- partitioning of multi-core shared caches. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. K. Ye, Z. Wu, C. Wang, B. Zhou, W. Si, X. Jiang, and A. Zomaya. Profiling-based workload consolidation and migration in virtualized data centres. TPDS, 2014.Google ScholarGoogle Scholar
  76. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!