skip to main content
tutorial

gMig: Efficient GPU Live Migration Optimized by Software Dirty Page for Full Virtualization

Published:25 March 2018Publication History
Skip Abstract Section

Abstract

This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization. By taking advantage of the dirty pattern of GPU workloads, gMig presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficient GPU live migration. Particularly, we propose three approaches for gMig: 1) Dynamic Graphics Address Remapping, which parses and manipulates GPU commands to adjust the address mapping to adapt to a different environment after migration, 2) Software Dirty Page, which utilizes a hashing based approach to detect page modification, overcomes the commodity GPU's hardware limitation, and speeds up the migration by only sending the dirtied pages, 3) One-Shot Pre-Copy, which greatly reduces the rounds of pre-copy of graphics memory. Our evaluation shows that gMig achieves GPU live migration with an average downtime of 302 ms on Windows and 119 ms on Linux. With the help of Software Dirty Page, the number of GPU pages transferred during the downtime is effectively reduced by 80.0%.

References

  1. 2015. Intel graphics virtualization technology (intel gvt). https://01.org/igvt-g. (2015).Google ScholarGoogle Scholar
  2. 2016. AMD Multiuser GPU: Hardware-Enabled GPU Virtualization for a True Workstation Experience. http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf. (2016).Google ScholarGoogle Scholar
  3. 2016. GRID VIRTUAL GPU User Guide. http://images.nvidia.com/content/grid/pdf/GRID-vGPU-User-Guide.pdf. (2016).Google ScholarGoogle Scholar
  4. 2016. Introducing Amazon EC2 P2 Instances, the largest GPU-Powered virtual machine in the cloud. https://aws.amazon.com/aboutaws/whats-new/2016/09/introducing-amazon-ec2-p2-instances-the-largest-gpu-powered-virtual-machine-in-the-cloud/. (2016).Google ScholarGoogle Scholar
  5. 2017. Elastic GPU Service. https://www.alibabacloud.com/product/gpu. (2017).Google ScholarGoogle Scholar
  6. 2017. High Availability: NVIDIA GRID Showcases vGPU Monitoring and Migration. https://blogs.nvidia.com/blog/2017/06/22/high-availability-nvidia-grid-showcases-vgpu-monitoring-and-migration/. (2017).Google ScholarGoogle Scholar
  7. Anton Beloglazov and Rajkumar Buyya. 2010. Energy efficient resource management in virtualized cloud data centers. In Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing. IEEE Computer Society, 826--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kevin Boos, Ardalan Amiri Sani, and Lin Zhong. 2015. Eliminating State Entanglement with Checkpoint-based Virtualization of Mobile OS Services. In Proceedings of the 6th Asia-Pacific Workshop on Systems. ACM, 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live migration of virtual machines. 2nd conference on Symposium on Networked Systems Design & Implementation 2, 273--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. 2008. Remus: High availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation. San Francisco, 161--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yaozu Dong, Mochi Xue, Xiao Zheng, Jiajun Wang, Zhengwei Qi, and Haibing Guan. 2015. Boosting GPU virtualization performance with hybrid shadow page tables. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 517--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yaozu Dong, Xiaowei Yang, Xiaoyong Li, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. In IEEE, International Symposium on High Performance Computer Architecture. IEEE, 1471--1480.Google ScholarGoogle Scholar
  13. Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jacob Gorm Hansen. 2007. Blink: Advanced display multiplexing for virtualized applications. In Proceedings of NOSSDAV.Google ScholarGoogle Scholar
  15. Michael R. Hines, Umesh Deshpande, and Kartik Gopalan. 2009. Post-copy Live Migration of Virtual Machines. SIGOPS Oper. Syst. Rev. 43, 3 (July 2009), 14--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. 2017. GPU Virtualization and Scheduling Methods: A Comprehensive Survey. ACM Comput. Surv. 50, 3, Article 35 (June 2017), 37 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bolin Hu, Zhou Lei, and Yu Lei. 2012. A Time-Series Based Precopy Approach for Live Migration of Virtual Machines. In IEEE International Conference on Parallel & Distributed Systems. IEEE, 947--952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hai Jin, Li Deng, Song Wu, Xuanhua Shi, and Xiaodong Pan. 2009. Live Virtual MachineMigration with Adaptive Memory Compression. In IEEE International Conference on Cluster Computing and Workshops. IEEE, 1--10.Google ScholarGoogle Scholar
  19. Akane Koto, Hiroshi Yamada, Kei Ohmura, and Kenji Kono. 2012. Towards unobtrusive VM live migration for cloud computing platforms. In Proceedings of the Asia-Pacific Workshop on Systems. ACM, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. HaiKun Liu, Hai Jin, Xiaofei Liao, Liting Hu, and Chen Yu. 2009. Live Migration of VirtualMachine Based on Full System Trace and Replay. In Proceedings of the 18th International Symposium on High Performance Distributed Computing. HPDC, 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Liang Liu, Hao Wang, Xue Liu, Xing Jin, Wen Bo He, Qing Bo Wang, and Ying Chen. 2009. GreenCloud: a new architecture for green data center. In Proceedings of the 6th international conference industry session on Autonomic computing and communications industry session. ACM, 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ferran Pérez, Carlos Reaño, and Federico Silla. 2016. Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA. Springer International Publishing, Cham, 82--95.Google ScholarGoogle Scholar
  23. C. Reaño and F. Silla. 2016. Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB. In 2016 IEEE Symposium on Computers and Communication (ISCC). 920--925.Google ScholarGoogle Scholar
  24. Shashank Sahni and Vasudeva Varma. 2012. A hybrid approach to live migration of virtual machines. In Cloud Computing in Emerging Markets (CCEM), 2012 IEEE International Conference on. IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  25. Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. Computers, IEEE Transactions on 61, 6 (2012), 804--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christopher Smowton. 2009. Secure 3D Graphics for Virtual Machines. In Proceedings of the Second European Workshop on System Security (EUROSEC '09). ACM, New York, NY, USA, 36--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: why not virtualizing GPUs at the hypervisor?. In Proceedings of the 2014 USENIX conference on USENIX Annual Technical Conference. USENIX Association, 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2016. Gpuvm: Gpu virtualization at the hypervisor. IEEE Trans. Comput. 65, 9 (2016), 2752--2766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Petter Svard, Benoit Hudzia, Johan Tordsson, and Erik Elmroth. 2011. Evaluation of DeltaCompression Techniques for Efficient Live Migration of Large VirtualMachines. In Proceedings of the 7th ACM International Comference on Virtual Execution Environments. ACM, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-through. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 121--132. http://dl.acm.org/citation.cfm?id=2643634.2643647 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Carl A Waldspurger. 2002. Memory resource management in VMware ESX server. ACM SIGOPS Operating Systems Review 36, SI (2002), 181--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. P. Walters, A. J. Younge, D. I. Kang, K. T. Yao, M. Kang, S. P. Crago, and G. C. Fox. 2014. GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In 2014 IEEE 7th International Conference on Cloud Computing. 636--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mochi Xue, Jiacheng Ma, Wentai Li, Kun Tian, Yaozu Dong, Jinyu Wu, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2018. Scalable GPU Virtualization with Dynamic Sharing of Graphics Memory Space. IEEE Transactions on Parallel and Distributed Systems (2018).Google ScholarGoogle Scholar
  34. Mochi Xue, Kun Tian, Yaozu Dong, Jiacheng Ma, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling Up GPU Virtualization with Dynamic Sharing of Graphics Memory Space. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 579--590. http://dl.acm.org/citation.cfm?id=3026959.3027012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. ZiZhuo Zhang, Xinhao Xu, Mochi Xue, and Yaozu Dong. 2016. gHA: An Efficient and Iterative Checkpointing Mechanism for Virtualized GPUs. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems. ACM, 1:1--1:8. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. gMig: Efficient GPU Live Migration Optimized by Software Dirty Page for Full Virtualization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 53, Issue 3
      VEE '18
      March 2018
      99 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3296975
      Issue’s Table of Contents
      • cover image ACM Conferences
        VEE '18: Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
        March 2018
        106 pages
        ISBN:9781450355797
        DOI:10.1145/3186411

      Copyright © 2018 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 March 2018

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!