Abstract
Checkpointing, i.e., recording the volatile state of a virtual machine (VM) running as a guest in a virtual machine monitor (VMM) for later restoration, includes storing the memory available to the VM. Typically, a full image of the VM's memory along with processor and device states are recorded. With guest memory sizes of up to several gigabytes, the size of the checkpoint images becomes more and more of a concern.
In this work we present a technique for fast and space-efficient checkpointing of virtual machines. In contrast to existing methods, our technique eliminates redundant data and stores only a subset of the VM's memory pages. Our technique transparently tracks I/O operations of the guest to external storage and maintains a list of memory pages whose contents are duplicated on non-volatile storage. At a checkpoint, these pages are excluded from the checkpoint image.
We have implemented the proposed technique for paravirtualized as well as fully-virtualized guests in the Xen VMM. Our experiments with a paravirtualized guest (Linux) and two fullyvirtualized guests (Linux, Windows) show a significant reduction in the size of the checkpoint image as well as the time required to complete the checkpoint. Compared to the current Xen implementation, we achieve, on average, an 81% reduction in the stored data and a 74% reduction in the time required to take a checkpoint for the paravirtualized Linux guest. In a fully-virtualized environment runningWindows and Linux guests, we achieve a 64% reduction of the image size along with a 62% reduction in checkpointing time.
- Red Hat, Inc. LVM architectural overview. http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/Cluster_Log%ical_Volume_Manager/LVM_definition.html.Google Scholar
- Transcendent Memory Project. http://oss.oracle.com/projects/tmem.Google Scholar
- VirtualBox. http://www.virtualbox.org.Google Scholar
- VMware Workstation. http://www.vmware.com/products/workstation.Google Scholar
- M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, Feb 2009. URL http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009--28.html.Google Scholar
- P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In phSOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 164--177, New York, NY, USA, 2003. ACM. ISBN 1-58113-757-5. http://doi.acm.org/10.1145/945445.945462. Google Scholar
Digital Library
- D. Bovet and M. Cesati. phUnderstanding the Linux Kernel, 3rd Edition. Oreilly & Associates, 2005. Google Scholar
Digital Library
- T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst., 14 (1): 80--107, 1996. ISSN 0734-2071. http://doi.acm.org/10.1145/225535.225538. Google Scholar
Digital Library
- E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. Disco: running commodity operating systems on scalable multiprocessors. ACM Trans. Comput. Syst., 15 (4): 412--447, 1997. ISSN 0734-2071. http://doi.acm.org/10.1145/265924.265930. Google Scholar
Digital Library
- C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In NSDI'05: Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation, pages 273--286, Berkeley, CA, USA, 2005. USENIX Association. Google Scholar
Digital Library
- B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: high availability via asynchronous virtual machine replication. In NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pages 161--174, Berkeley, CA, USA, 2008. USENIX Association. ISBN 111-999-5555-22-1. Google Scholar
Digital Library
- G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. Revirt: enabling intrusion analysis through virtual-machine logging and replay. In OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation, pages 211--224, New York, NY, USA, 2002. ACM. ISBN 978-1-4503-0111-4. http://doi.acm.org/10.1145/1060289.1060309. Google Scholar
Digital Library
- D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. In OSDI '08: Proceedings of the 8th symposium on Operating systems design and implementation, 2008. Google Scholar
Digital Library
- I. Habib. Virtualization with kvm. phLinux J., 2008, February 2008. ISSN 1075-3583. URL http://portal.acm.org/citation.cfm?id=1344209.1344217. Google Scholar
Digital Library
- M. R. Hines and K. Gopalan. Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning. In VEE '09: Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 51--60, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-375-4. http://doi.acm.org/10.1145/1508293.1508301. Google Scholar
Digital Library
- S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Geiger: monitoring the buffer cache in a virtual machine environment. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 14--24, New York, NY, USA, 2006. ACM. ISBN 1-59593-451-0. http://doi.acm.org/10.1145/1168857.1168861. Google Scholar
Digital Library
- J. Katcher. PostMark: A New File System Benchmark. Technical Report Technical Report TR3022, Network Appliance, October 1997.Google Scholar
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 1--1, Berkeley, CA, USA, 2005. USENIX Association. Google Scholar
Digital Library
- H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. Snowflock: rapid virtual machine cloning for cloud computing. In EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems, pages 1--12, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-482-9. http://doi.acm.org/10.1145/1519065.1519067. Google Scholar
Digital Library
- J. Liu, W. Huang, B. Abali, and D. K. Panda. High performance vmm-bypass i/o in virtual machines. In phATEC '06: Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 3--3, Berkeley, CA, USA, 2006. USENIX Association. Google Scholar
Digital Library
- P. Lu and K. Shen. Virtual machine memory access tracing with hypervisor exclusive cache. In ATC'07: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, pages 1--15, Berkeley, CA, USA, 2007. USENIX Association. ISBN 999-8888-77-6. Google Scholar
Digital Library
- D. Magenheimer, C. Mason, D. McCracken, and K. Hackel. Transcendent memory and linux. In Proceedings of the Linux Symposium, pages 191--200, Montreal, Quebec Canada, 2009.Google Scholar
- D. T. Meyer, G. Aggarwal, B. Cully, G. Lefebvre, M. J. Feeley, N. C. Hutchinson, and A. Warfield. Parallax: virtual disks for virtual machines. In Eurosys '08: Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, pages 41--54, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-013-5. http://doi.acm.org/10.1145/1352592.1352598. Google Scholar
Digital Library
- G. Milos, D. G. Murray, S. Hand, and M. A. Fetterman. Satori: Enlightened page sharing. In ATC'09: 2009 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, Berkeley, CA, USA, 2009. USENIX Association. Google Scholar
Digital Library
- M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In phATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 25--25, Berkeley, CA, USA, 2005. USENIX Association. Google Scholar
Digital Library
- J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: transparent checkpointing under unix. In TCON'95: Proceedings of the USENIX 1995 Technical Conference Proceedings on USENIX 1995 Technical Conference Proceedings, pages 18--18, Berkeley, CA, USA, 1995. USENIX Association. Google Scholar
Digital Library
- J. S. Plank, Y. Chen, K. Li, M. Beck, and G. Kingsley. Memory exclusion: optimizing the performance of checkpointing systems. phSoftw. Pract. Exper., 29 (2): 125--142, 1999. ISSN 0038-0644. http://dx.doi.org/10.1002/(SICI)1097-024X(199902)29:2 125::AID-SPE224 3%.0.CO;2--7. Google Scholar
Digital Library
- J. R. Santos, Y. Turner, G. Janakiraman, and I. Pratt. Bridging the gap between software and hardware techniques for i/o virtualization. In ATC'08: USENIX 2008 Annual Technical Conference on Annual Technical Conference, pages 29--42, Berkeley, CA, USA, 2008. USENIX Association. Google Scholar
Digital Library
- M. Schwidefsky, H. Franke, R. Mansell, H. Raj, D. Osisek, and J. Choi. Collaborative memory management in hosted linux environments. In Proceedings of the Linux Symposium, pages 313--328, Ottawa, Ontario, Canada, 2006.Google Scholar
- Y. Tamura. Kemari: Virtual machine synchronization for fault tolerance using domt. In Xen Summit, 2008.Google Scholar
- M. Vrable, J. Ma, J. Chen, D. Moore, E. Vandekieft, A. C. Snoeren, G. M. Voelker, and S. Savage. Scalability, fidelity, and containment in the potemkin virtual honeyfarm. In SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles, pages 148--162, New York, NY, USA, 2005. ACM. ISBN 1-59593-079-5. http://doi.acm.org/10.1145/1095810.1095825. Google Scholar
Digital Library
- C. A. Waldspurger. Memory resource management in vmware esx server. In OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation, pages 181--194, New York, NY, USA, 2002. ACM. ISBN 978-1-4503-0111-4. http://doi.acm.org/10.1145/1060289.1060307. Google Scholar
Digital Library
- T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black-box and gray-box strategies for virtual machine migration. In phNSDI'07: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, Berkeley, CA, USA, 2007. USENIX Association. Google Scholar
Digital Library
- W. Zhao and Z. Wang. Dynamic memory balancing for virtual machines. In VEE '09: Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 21--30, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-375-4. http://doi.acm.org/10.1145/1508293.1508297. Google Scholar
Digital Library
- P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, pages 177--188, New York, NY, USA, 2004. ACM. ISBN 1-58113-804-0. http://doi.acm.org/10.1145/1024393.1024415. Google Scholar
Digital Library
Index Terms
Fast and space-efficient virtual machine checkpointing
Recommendations
Fast and space-efficient virtual machine checkpointing
VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsCheckpointing, i.e., recording the volatile state of a virtual machine (VM) running as a guest in a virtual machine monitor (VMM) for later restoration, includes storing the memory available to the VM. Typically, a full image of the VM's memory along ...
Thread-Based Live Checkpointing of Virtual Machines
NCA '11: Proceedings of the 2011 IEEE 10th International Symposium on Network Computing and ApplicationsVirtual machine check pointing is the mechanism to save virtual machine state to a file for later recovery. Traditional check pointing mechanisms can suffer a long delay and cause a long disruption of services since they have to stop virtual machines to ...
A parallel migration scheme for fast virtual machine relocation on a cloud cluster
The paper proposes a method for parallelizing migrations to reduce the time required for virtual machine (VM) relocation. During VM relocation, VMs need to wait for their migrations in a chain due to the limited resource of the physical machines (PMs), ...









Comments