ABSTRACT
Server systems with large amounts of physical memory can benefit from using some of the available memory capacity for in-memory snapshots of the ongoing computations. In-memory snapshots are useful for services such as scaling of new workload instances, debugging, during scheduling, etc., which do not require snapshot persistence across node crashes/reboots. Since increasingly more frequently servers run containerized workloads, using technologies such as Docker, the snapshot, and the subsequent snapshot restore mechanisms, would be applied at granularity of containers. However, CRIU, the current approach to snapshot/restore containers, suffers from expensive filesystem write/read operations on image files containing memory pages, which dominate the runtime costs and impact the potential benefits of manipulating in-memory process state.
In this paper, we demonstrate that these overheads can be eliminated by using MVAS -- kernel support for multiple independent virtual address spaces (VAS), designed specifically for machines with large memory capacities. The resulting VAS-CRIU stores application memory as a separate snapshot address space in DRAM and avoids costly file system operations. This accelerates the snapshot/restore of address spaces by two orders of magnitude, resulting in an overall reduction in snapshot time by up to 10× and restore time by up to 9×. We demonstrate the utility of VAS-CRIU for container management services such as fine-grained snapshot generation and container instance scaling.
References
- Apache. 2019. Cassandra. http://cassandra.apache.org/.Google Scholar
- Edouard Bugnion, Vitaly Chipounov, and George Candea. 2013. Lightweight snapshots and system-level backtracking. In Proceedings of the 14th Workshop on Hot Topics on Operating Systems. USENIX.Google Scholar
- CRIU community. 2019. Checkpoint/Restart in Userspace(CRIU). https://criu.org/.Google Scholar
- Izzat El Hajj, Alexander Merritt, Gerd Zellweger, Dejan Milojicic, Reto Achermann, Paolo Faraboschi, Wen-mei Hwu, Timothy Roscoe, and Karsten Schwan. 2016. SpaceJMP: Programming with Multiple Virtual Address Spaces. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 353--368. Google Scholar
- Paolo Faraboschi, Kimberly Keeton, Tim Marsland, and Dejan Milojicic. 2015. Beyond Processor-centric Operating Systems. In 15th Workshop on Hot Topics in Operating Systems.Google Scholar
- Pradeep Fernando, Sudarsun Kannan, Ada Gavrilovska, and Karsten Schwan. 2016. Phoenix: Memory speed hpc i/o with nvm. In 2016 IEEE 23rd International Conference on High Performance Computing (HiPC). IEEE, 121--131.Google Scholar
- Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin. 2009. First-aid: Surviving and Preventing Memory Management Bugs During Production Runs. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09). ACM, New York, NY, USA, 159--172. Google Scholar
- Roberto Gioiosa, Jose Carlos Sancho, Song Jiang, Fabrizio Petrini, and Kei Davis. 2005. Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing. IEEE Computer Society, 9.Google Scholar
- Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy H Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.. In NSDI, Vol. 11. 22--22.Google Scholar
- Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan, and Dejan Milojicic. 2013. Optimizing checkpoints using nvm as virtual memory. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IEEE, 29--40.Google Scholar
- Andrew Lenharth, Vikram S Adve, and Samuel T King. 2009. Recovery domains: an organizing principle for recoverable operating systems. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 49--60.Google Scholar
- Jack Li, Calton Pu, Yuan Chen, Vanish Talwar, and Dejan Milojicic. 2015. Improving preemptive scheduling with application-transparent checkpointing in shared clusters. In Proceedings of the 16th Annual Middleware Conference. ACM, 222--234.Google Scholar
- Yawei Li and Zhiling Lan. 2011. FREM: A fast restart mechanism for general checkpoint/restart. IEEE Trans. Comput. 60, 5 (2011), 639--652.Google Scholar
- Yandong Mao, Frans Kaashoek, and Robert Morris. 2010. Optimizing MapReduce for Multicore Architectures. Technical Report MIT-CSAIL-TR-2010-020. MIT.Google Scholar
- S. Nadgowda, S. Suneja, and A. Kanso. 2017. Comparing Scaling Methods for Linux Containers. In 2017 IEEE International Conference on Cloud Engineering (IC2E). 266--272. Google Scholar
- Tom Nolle. 2018. Expect more container evolution, growth in 2019. https://searchitoperations.techtarget.com/opinion/Expect-more-container-evolution-growth-in-2019.Google Scholar
- James S Plank, Micah Beck, Gerry Kingsley, and Kai Li. 1994. Libckpt: Transparent checkpointing under unix. Computer Science Department.Google Scholar
- Georgios Portokalidis and Angelos D Keromytis. 2011. REASSURE: A self-contained mechanism for healing software using rescue points. In International Workshop on Security. Springer, 16--32.Google Scholar
- Feng Qin, Joseph Tucek, Jagadeesan Sundaresan, and Yuanyuan Zhou. 2005. Rx: Treating Bugs As Allergies---a Safe Method to Survive Software Failures. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles (SOSP '05). ACM, New York, NY, USA, 235--248. Google Scholar
- Joseph F Ruscio, Michael A Heffner, and Srinidhi Varadarajan. 2007. Dejavu: Transparent user-level checkpointing, migration, and recovery for distributed systems. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International. IEEE, 1--10.Google Scholar
- Rodrigo Siqueira. 2017. MVAS-CLI. https://github.com/LSS-USP/mvas-cli.Google Scholar
- Till Smejkal. 2016. Userspace MVAS library-libmvas. https://github.com/l3nkz/libmvas.Google Scholar
- Till Smejkal and Ranjan Sarpangala Venkatesh. 2017. MVAS Linux kernel. https://github.com/ranjansv/MVAS-CP-Linux-kernel.Google Scholar
- Sudarshan M Srinivasan, Srikanth Kandula, Christopher R Andrews, Yuanyuan Zhou, et al. 2004. Flashback: A lightweight extension for rollback and deterministic replay for software debugging. In USENIX Annual Technical Conference, General Track. Boston, MA, USA, 29--44.Google Scholar
- Manav Vasavada, Frank Mueller, Paul H Hargrove, and Eric Roman. 2011. Comparing different approaches for incremental checkpointing: The showdown. In Linux Symposium. 69.Google Scholar
- Steven J. Vaughan-Nichols. 2017. What is Docker and why is it so darn popular? ZDNet.com (2017).Google Scholar
- Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. 2013. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 5.Google Scholar
- Ranjan Sarpangala Venkatesh. 2017. VAS-CRIU. https://github.com/ranjansv/VAS-CRIU.Google Scholar
- Angeliki Zavou, Georgios Portokalidis, and Angelos D. Keromytis. 2012. Self-Healing Multitier Architectures Using Cascading Rescue Points. In Annual Computer Security Applications Conference (ACSAC).Google Scholar
- Chuck Chengyan Zhao, J Gregory Steffan, Cristiana Amza, and Allan Kielstra. 2012. Compiler support for fine-grain software-only checkpointing. In International Conference on Compiler Construction. Springer, 200--219.Google Scholar
Index Terms
Fast in-memory CRIU for docker containers



Comments