Abstract
This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization. By taking advantage of the dirty pattern of GPU workloads, gMig presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficient GPU live migration. Particularly, we propose three approaches for gMig: 1) Dynamic Graphics Address Remapping, which parses and manipulates GPU commands to adjust the address mapping to adapt to a different environment after migration, 2) Software Dirty Page, which utilizes a hashing based approach to detect page modification, overcomes the commodity GPU's hardware limitation, and speeds up the migration by only sending the dirtied pages, 3) One-Shot Pre-Copy, which greatly reduces the rounds of pre-copy of graphics memory. Our evaluation shows that gMig achieves GPU live migration with an average downtime of 302 ms on Windows and 119 ms on Linux. With the help of Software Dirty Page, the number of GPU pages transferred during the downtime is effectively reduced by 80.0%.
- 2015. Intel graphics virtualization technology (intel gvt). https://01.org/igvt-g. (2015).Google Scholar
- 2016. AMD Multiuser GPU: Hardware-Enabled GPU Virtualization for a True Workstation Experience. http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf. (2016).Google Scholar
- 2016. GRID VIRTUAL GPU User Guide. http://images.nvidia.com/content/grid/pdf/GRID-vGPU-User-Guide.pdf. (2016).Google Scholar
- 2016. Introducing Amazon EC2 P2 Instances, the largest GPU-Powered virtual machine in the cloud. https://aws.amazon.com/aboutaws/whats-new/2016/09/introducing-amazon-ec2-p2-instances-the-largest-gpu-powered-virtual-machine-in-the-cloud/. (2016).Google Scholar
- 2017. Elastic GPU Service. https://www.alibabacloud.com/product/gpu. (2017).Google Scholar
- 2017. High Availability: NVIDIA GRID Showcases vGPU Monitoring and Migration. https://blogs.nvidia.com/blog/2017/06/22/high-availability-nvidia-grid-showcases-vgpu-monitoring-and-migration/. (2017).Google Scholar
- Anton Beloglazov and Rajkumar Buyya. 2010. Energy efficient resource management in virtualized cloud data centers. In Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing. IEEE Computer Society, 826--831. Google Scholar
Digital Library
- Kevin Boos, Ardalan Amiri Sani, and Lin Zhong. 2015. Eliminating State Entanglement with Checkpoint-based Virtualization of Mobile OS Services. In Proceedings of the 6th Asia-Pacific Workshop on Systems. ACM, 20. Google Scholar
Digital Library
- Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live migration of virtual machines. 2nd conference on Symposium on Networked Systems Design & Implementation 2, 273--286. Google Scholar
Digital Library
- Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. 2008. Remus: High availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation. San Francisco, 161--174. Google Scholar
Digital Library
- Yaozu Dong, Mochi Xue, Xiao Zheng, Jiajun Wang, Zhengwei Qi, and Haibing Guan. 2015. Boosting GPU virtualization performance with hybrid shadow page tables. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 517--528. Google Scholar
Digital Library
- Yaozu Dong, Xiaowei Yang, Xiaoyong Li, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. In IEEE, International Symposium on High Performance Computer Architecture. IEEE, 1471--1480.Google Scholar
- Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24. Google Scholar
Digital Library
- Jacob Gorm Hansen. 2007. Blink: Advanced display multiplexing for virtualized applications. In Proceedings of NOSSDAV.Google Scholar
- Michael R. Hines, Umesh Deshpande, and Kartik Gopalan. 2009. Post-copy Live Migration of Virtual Machines. SIGOPS Oper. Syst. Rev. 43, 3 (July 2009), 14--26. Google Scholar
Digital Library
- Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. 2017. GPU Virtualization and Scheduling Methods: A Comprehensive Survey. ACM Comput. Surv. 50, 3, Article 35 (June 2017), 37 pages. Google Scholar
Digital Library
- Bolin Hu, Zhou Lei, and Yu Lei. 2012. A Time-Series Based Precopy Approach for Live Migration of Virtual Machines. In IEEE International Conference on Parallel & Distributed Systems. IEEE, 947--952. Google Scholar
Digital Library
- Hai Jin, Li Deng, Song Wu, Xuanhua Shi, and Xiaodong Pan. 2009. Live Virtual MachineMigration with Adaptive Memory Compression. In IEEE International Conference on Cluster Computing and Workshops. IEEE, 1--10.Google Scholar
- Akane Koto, Hiroshi Yamada, Kei Ohmura, and Kenji Kono. 2012. Towards unobtrusive VM live migration for cloud computing platforms. In Proceedings of the Asia-Pacific Workshop on Systems. ACM, 7. Google Scholar
Digital Library
- HaiKun Liu, Hai Jin, Xiaofei Liao, Liting Hu, and Chen Yu. 2009. Live Migration of VirtualMachine Based on Full System Trace and Replay. In Proceedings of the 18th International Symposium on High Performance Distributed Computing. HPDC, 101--110. Google Scholar
Digital Library
- Liang Liu, Hao Wang, Xue Liu, Xing Jin, Wen Bo He, Qing Bo Wang, and Ying Chen. 2009. GreenCloud: a new architecture for green data center. In Proceedings of the 6th international conference industry session on Autonomic computing and communications industry session. ACM, 29--38. Google Scholar
Digital Library
- Ferran Pérez, Carlos Reaño, and Federico Silla. 2016. Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA. Springer International Publishing, Cham, 82--95.Google Scholar
- C. Reaño and F. Silla. 2016. Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB. In 2016 IEEE Symposium on Computers and Communication (ISCC). 920--925.Google Scholar
- Shashank Sahni and Vasudeva Varma. 2012. A hybrid approach to live migration of virtual machines. In Cloud Computing in Emerging Markets (CCEM), 2012 IEEE International Conference on. IEEE, 1--5.Google Scholar
Cross Ref
- Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. Computers, IEEE Transactions on 61, 6 (2012), 804--816. Google Scholar
Digital Library
- Christopher Smowton. 2009. Secure 3D Graphics for Virtual Machines. In Proceedings of the Second European Workshop on System Security (EUROSEC '09). ACM, New York, NY, USA, 36--43. Google Scholar
Digital Library
- Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: why not virtualizing GPUs at the hypervisor?. In Proceedings of the 2014 USENIX conference on USENIX Annual Technical Conference. USENIX Association, 109--120. Google Scholar
Digital Library
- Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2016. Gpuvm: Gpu virtualization at the hypervisor. IEEE Trans. Comput. 65, 9 (2016), 2752--2766. Google Scholar
Digital Library
- Petter Svard, Benoit Hudzia, Johan Tordsson, and Erik Elmroth. 2011. Evaluation of DeltaCompression Techniques for Efficient Live Migration of Large VirtualMachines. In Proceedings of the 7th ACM International Comference on Virtual Execution Environments. ACM, 75--86. Google Scholar
Digital Library
- Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-through. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 121--132. http://dl.acm.org/citation.cfm?id=2643634.2643647 Google Scholar
Digital Library
- Carl A Waldspurger. 2002. Memory resource management in VMware ESX server. ACM SIGOPS Operating Systems Review 36, SI (2002), 181--194. Google Scholar
Digital Library
- J. P. Walters, A. J. Younge, D. I. Kang, K. T. Yao, M. Kang, S. P. Crago, and G. C. Fox. 2014. GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In 2014 IEEE 7th International Conference on Cloud Computing. 636--643. Google Scholar
Digital Library
- Mochi Xue, Jiacheng Ma, Wentai Li, Kun Tian, Yaozu Dong, Jinyu Wu, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2018. Scalable GPU Virtualization with Dynamic Sharing of Graphics Memory Space. IEEE Transactions on Parallel and Distributed Systems (2018).Google Scholar
- Mochi Xue, Kun Tian, Yaozu Dong, Jiacheng Ma, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling Up GPU Virtualization with Dynamic Sharing of Graphics Memory Space. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 579--590. http://dl.acm.org/citation.cfm?id=3026959.3027012 Google Scholar
Digital Library
- ZiZhuo Zhang, Xinhao Xu, Mochi Xue, and Yaozu Dong. 2016. gHA: An Efficient and Iterative Checkpointing Mechanism for Virtualized GPUs. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems. ACM, 1:1--1:8. Google Scholar
Digital Library
Index Terms
gMig: Efficient GPU Live Migration Optimized by Software Dirty Page for Full Virtualization
Recommendations
gMig: Efficient GPU Live Migration Optimized by Software Dirty Page for Full Virtualization
VEE '18: Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsThis paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization. By taking advantage of the dirty pattern of GPU workloads, gMig presents the One-Shot Pre-Copy combined with the hashing based Software Dirty ...
Virtual Machine Migration Method between Different Hypervisor Implementations and Its Evaluation
WAINA '12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications WorkshopsVirtualization technologies are an important building block for cloud services. Each service will run on virtual machines (VMs) deployed over different hyper visors in the future. Therefore, a VM migration method between different hyper visor ...
Urgent Virtual Machine Eviction with Enlightened Post-Copy
VEE '16Virtual machine (VM) migration demands distinct properties under resource oversubscription and workload surges. We present enlightened post-copy, a new mechanism for VMs under contention that evicts the target VM with fast execution transfer and short ...







Comments