Abstract
Memory Management Units (MMUs) for on-device address translation are widely used in modern devices. However, conventional solutions for on-device MMU virtualization, such as shadow page table implemented in mediated pass-through, still suffer from high complexity and low performance.
We present Demon, an efficient solution for on-DEvice MMU virtualizatiON in mediated pass-through. The key insight is that Demon takes advantage of IOMMU to construct a two-dimensional address translation and dynamically switches the 2nd-dimensional page table to a proper candidate when the device owner switches. In order to support fine-grained parallelism for the device with multiple engines, we put forward a hardware proposal that separates the address space of each engine and enables simultaneous device address remapping for multiple virtual machines (VMs). We implement Demon with a prototype named gDemon which virtualizes Intel GPU MMU. Nonetheless, Demon is not limited to this particular case. Evaluations show that gDemon provides up to 19.73x better performance in the media transcoding workloads and achieves performance improvement of up to 17.09% and 13.73% in the 2D benchmarks and 3D benchmarks, respectively, compared with gVirt. The current release of gDemon scales up to 6 VMs with moderate performance in our experiments. In addition, gDemon simplifies the implementation of GPU MMU virtualization with 37% code reduction.
- 2012. KVM on System z: Channel I/O And How To Virtualize It. https://www.linux-kvm.org/images/1/13/2012-forum-channel-io-kvm-forum.pdf. (2012).Google Scholar
- 2015. Intel Open Source HD Graphics and Intel Iris Graphics Programmer's Reference Manual, Volume 5: Memory Views. https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol05-memory_views_3.pdf. (2015).Google Scholar
- 2016. AMD Kaveri. http://www.amd.com/en-us/products/processors/desktop/a-series-apu. (2016).Google Scholar
- 2016. AMD Multiuser GPU (MxGPU). http://www.amd.com/en-us/solutions/professional/virtualization. (2016).Google Scholar
- 2016. Dual-core ARM Cortex-M4 IPU subsystem. https://training.ti.com/sites/default/files/docs/Running_RTOS_on_Cortex_M4_SLIDES.pdf. (2016).Google Scholar
- 2016. Intel VT-d Architecture Specification. http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf. (2016).Google Scholar
- 2016. Live Migration of vGPU. http://schd.ws/hosted_files/xensummit2016/c7/XenGT-LiveMigration_1.00.pdf. (2016).Google Scholar
- 2016. NVIDIA GRID Virtual GPU Technology. https://www.nvidia.com/en-us/design-visualization/technologies/virtual-gpu/. (2016).Google Scholar
- 2016. VFIO Mediated devices. https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt. (2016).Google Scholar
- 2016. VGPU on KVM, VFIO based mediated device framework. http://www.linux-kvm.org/images/5/59/02x03-Neo_Jia_and_Kirti_Wankhede-vGPU_on_KVM-A_VFIO_based_Framework.pdf. (2016).Google Scholar
- 2017. Generic Buffer Sharing Mechanism for Mediated Devices. https://kvmforum2017.sched.com/event/BnoJ/generic-buffer-sharing-mechanism-for-mediated-devices-tina-zhang-intel. (2017).Google Scholar
- 2017. Intel Processor Graphics. https://01.org/zh/linuxgraphics. (2017).Google Scholar
- 2017. Live Migration with Mediated Device. https://kvmforum2017.sched.com/event/BnoH/live-migration-with-mediated-device-yulei-zhang-intel. (2017).Google Scholar
- 2017. NVIDIA GeForce series. http://www.geforce.com/hardware. (2017).Google Scholar
- 2017. NVIDIA GRID Showcases vGPU Monitoring and Migration. https://blogs.nvidia.com/blog/2017/06/22/high-availability-nvidia-grid-showcases-vgpu-monitoring-and-migration/. (2017).Google Scholar
- 2018. Radeon RX Vega M Graphics. https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/8th-gen-radeon-rx-vega-m-product-overview.pdf. (2018).Google Scholar
- Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel Virtualization Technology for Directed I/O. Intel technology journal 10, 3 (2006).Google Scholar
- I AMD and O Virtualization. 2007. Technology (IOMMU) Specification. (2007).Google Scholar
- Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In ACM SIGOPS Operating Systems Review, Vol. 37. ACM, 164--177. Google Scholar
Digital Library
- Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator.. In USENIX Annual Technical Conference, FREENIX Track. 41--46. Google Scholar
Digital Library
- Muli Ben-Yehuda, Michael D Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, Orit Wasserman, and Ben-Ami Yassour. 2010. The Turtles Project: Design and Implementation of Nested Virtualization.. In OSDI, Vol. 10. 423--436. Google Scholar
Digital Library
- Muli Ben-Yehuda, Jon Mason, Jimi Xenidis, Orran Krieger, Leendert Van Doorn, Jun Nakajima, Asit Mallick, and Elsie Wahlig. 2006. Utilizing IOMMUs for virtualization in Linux and Xen. In OLS'06: The 2006 Ottawa Linux Symposium. Citeseer, 71--86.Google Scholar
- Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating two-dimensional page walks for virtualized systems. In ACM SIGARCH Computer Architecture News, Vol. 36. ACM, 26--35. Google Scholar
Digital Library
- Klaus Danne. 2004. Memory management to support multitasking on fpga based systems. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. 21.Google Scholar
- Yaozu Dong, Mochi Xue, Xiao Zheng, Jiajun Wang, Zhengwei Qi, and Haibing Guan. 2015. Boosting GPU virtualization performance with hybrid shadow page tables. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 517--528. Google Scholar
Digital Library
- Yaozu Dong, Xiaowei Yang, Jianhui Li, Guangdeng Liao, Kun Tian, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. J. Parallel and Distrib. Comput. 72, 11 (2012), 1471--1480. Google Scholar
Digital Library
- Yaozu Dong, Jianguo Yao, Halbing Guan, R Ananth Krishna, and Yunhong Jiang. 2017. MobiXen: Porting Xen on Android devices for mobile virtualization. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 946--949. Google Scholar
Digital Library
- Micah Dowty and Jeremy Sugerman. 2009. GPU virtualization on VMware's hosted I/O architecture. ACM SIGOPS Operating Systems Review 43, 3 (2009), 73--82. Google Scholar
Digital Library
- José Duato, Antonio J Pena, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In High Performance Computing and Simulation (HPCS), 2010 International Conference on. IEEE, 224--231.Google Scholar
Cross Ref
- Haibing Guan, Jianguo Yao, Zhengwei Qi, and Runze Wang. 2015. Energy-efficient SLA guarantees for virtualized GPU in cloud gaming. IEEE Transactions on Parallel and Distributed Systems 26, 9 (2015), 2434--2443.Google Scholar
Cross Ref
- Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24. Google Scholar
Digital Library
- Jacob Gorm Hansen. 2007. Blink: Advanced display multiplexing for virtualized applications. In Proceedings of NOSSDAV.Google Scholar
- ARM Holdings. 2013. ARM system memory management unit architecture specificationâĂŤSMMU architecture version 2.0. (2013).Google Scholar
- Cheol-Ho Hong, Ivor Spence, and Dimitrios S Nikolopoulos. 2017. GPU Virtualization and Scheduling Methods: A Comprehensive Survey. ACM Computing Surveys (CSUR) 50, 3 (2017), 35. Google Scholar
Digital Library
- Yu-Ju Huang, Hsuan-Heng Wu, Yeh-Ching Chung, and Wei-Chung Hsu. 2016. Building a kvm-based hypervisor for a heterogeneous system architecture compliant system. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 3--15. Google Scholar
Digital Library
- Greg Humphreys, Matthew Eldridge, Ian Buck, Gordan Stoll, Matthew Everett, and Pat Hanrahan. 2001. WireGL: a scalable graphics system for clusters. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. ACM, 129--140. Google Scholar
Digital Library
- Greg Humphreys, Mike Houston, Ren Ng, Randall Frank, Sean Ahern, Peter D Kirchner, and James T Klosowski. 2002. Chromium: a stream-processing framework for interactive rendering on clusters. ACM transactions on graphics (TOG) 21, 3 (2002), 693--702. Google Scholar
Digital Library
- H Andrés Lagar-Cavilla, Niraj Tolia, Mahadev Satyanarayanan, and Eyal De Lara. 2007. VMM-independent graphics acceleration. In Proceedings of the 3rd international conference on Virtual execution environments. ACM, 33--43. Google Scholar
Digital Library
- Qiumin Lu, Jianguo Yao, Zhengwei Qi, Bingsheng He, et al. 2016. Fairness-efficiency allocation of cpu-gpu heterogeneous resources. IEEE Transactions on Services Computing (2016).Google Scholar
- Gregory F Pfister. 2001. An introduction to the infiniband architecture. High Performance Mass Storage and Parallel I/O 42 (2001), 617--632.Google Scholar
- Zhengwei Qi, Jianguo Yao, Chao Zhang, Miao Yu, Zhizhou Yang, and Haibing Guan. 2014. VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming. ACM Transactions on Architecture and Code Optimization (TACO) 11, 2 (2014), 17. Google Scholar
Digital Library
- Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61, 6 (2012), 804--816. Google Scholar
Digital Library
- Christopher Smowton. 2009. Secure 3D graphics for virtual machines. In Proceedings of the Second European Workshop on System Security. ACM, 36--43. Google Scholar
Digital Library
- Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor?. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 109--120. Google Scholar
Digital Library
- Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A full GPU virtualization solution with mediated pass-through. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 121--132. Google Scholar
Digital Library
- David E Williams. 2007. Virtualization with Xen (tm): Including XenEnterprise, XenServer, and XenExpress. Syngress. Google Scholar
Digital Library
- Lei Xia, Jack Lange, Peter Dinda, and Chang Bae. 2009. Investigating virtual passthrough I/O on commodity devices. ACM SIGOPS Operating Systems Review 43, 3 (2009), 83--94. Google Scholar
Digital Library
- Mochi Xue, Kun Tian, Yaozu Dong, Jiacheng Ma, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space.. In USENIX Annual Technical Conference. 579--590. Google Scholar
Digital Library
- Jianguo Yao, Qiumin Lu, and Zhengwei Qi. 2017. Automated Resource Sharing for Virtualized GPU with Self-Configuration. In Reliable Distributed Systems (SRDS), 2017 IEEE 36th Symposium on. IEEE, 250--252.Google Scholar
Cross Ref
- Chao Zhang, Jianguo Yao, Zhengwei Qi, Miao Yu, and Haibing Guan. 2014. vGASA: Adaptive scheduling algorithm of virtualized GPU resource in cloud gaming. IEEE Transactions on Parallel and Distributed Systems 25, 11 (2014), 3036--3045.Google Scholar
Cross Ref
Index Terms
Demon: An Efficient Solution for on-Device MMU Virtualization in Mediated Pass-Through
Recommendations
Demon: An Efficient Solution for on-Device MMU Virtualization in Mediated Pass-Through
VEE '18: Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsMemory Management Units (MMUs) for on-device address translation are widely used in modern devices. However, conventional solutions for on-device MMU virtualization, such as shadow page table implemented in mediated pass-through, still suffer from high ...
Platform Device Assignment to KVM-on-ARM Virtual Machines via VFIO
EUC '14: Proceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous ComputingVFIO (Virtual Function I/O) is a Linux kernel infrastructure that allows to leverage the capabilities of modern IOMMUs to drive a device directly from user space without any additional specialized kernel driver being involved. When used by QEMU/KVM, a ...
Flexible Device Sharing in PCIe Clusters using Device Lending
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel ProcessingProcessing workloads may have very high IO demands, exceeding the capabilities provided by resource virtualization and requiring direct access to the physical hardware. For computers that are interconnected in PCI Express (PCIe) networks, we have ...







Comments