Abstract
With DevOps automation and an everything-as-code approach to lifecycle management for cloud-native applications, challenges emerge from an operational visibility and control perspective. Once a VM is deployed in production it typically becomes a hands-off entity in terms of restrictions towards inspecting or tuning it, for the fear of negatively impacting its operation. We present CIVIC (Cloning and Injection based VM Inspection for Cloud), a new mechanism that enables safe inspection of unmodified production VMs on-the-fly. CIVIC restricts all impact and side-effects of inspection or analysis operations inside a live clone of the production VM. New functionality over the replicated VM state is introduced using code injection. In this paper, we describe the design and implementation of our solution over KVM/QEMU. We demonstrate four of its use-cases-(i) safe reuse of system monitoring agents, (ii) impact-heavy problem diagnostics and troubleshooting, (iii) attaching an intrusive anomaly detector to a live service, and (iv) live tuning of a webserver's configuration parameters. Our evaluation shows CIVIC is nimble and lightweight in terms of memory footprint as well as clone activation time (6.5s), and has a low impact on the original VM (< 10%).
- Amos Waterland. Stress. http://people.seas.harvard.edu/apw/stress/.Google Scholar
- Anthony Liguori and Stefan Hajnoczi. QEMU Snapshots. http://wiki.qemu.org/Documentation/CreateSnapshot and http://wiki.qemu.org/Features/Snapshots2.Google Scholar
- Caleb Gilbert. Scaling Drupal: HTTP pipelining and benchmarking revisited. http://rocketmodule.com/blog/scaling-drupal-http-pipelining-and-benchmarking-revisited/.Google Scholar
- Daniel Stenberg. PHP cURL Manual. http://no1.php.net/manual/en/intro.curl.php.Google Scholar
- Jonathan Corbet and Andrea Arcangeli. Page faults in user space. http://lwn.net/Articles/615086/.Google Scholar
- Alexey Kopytov. SysBench Manual. http://sysbench.sourceforge.net/docs/#database_mode.Google Scholar
- Amazon. Summary of the October 22,2012 AWS Service Event in the US-East Region. https://aws.amazon.com/message/680342/.Google Scholar
- Andrea Arcangeli. Linux Userfault. https://kernel.googlesource.com/pub/scm/linux/kernel/git/andrea/aa/+/userfault.Google Scholar
- Angelo Laub. Practical Mac OS X Insecurity. https://events.ccc.de/congress/2004/fahrplan/files/95-macosx-insecurity-paper.pdf.Google Scholar
- P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Proceedings of the 1998 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '98/PERFORMANCE '98, pages 151--160, New York, NY, USA, 1998. ACM. Google Scholar
Digital Library
- S. Butt, H. A. Lagar-Cavilla, A. Srivastava, and V. Ganapathy. Self-service cloud computing. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS '12, pages 253--264, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- M. Carbone, M. Conover, B. Montague, and W. Lee. Secure and robust monitoring of virtual machines through guest-assisted introspection. In Proceedings of the 15th International Conference on Research in Attacks, Intrusions, and Defenses, RAID'12, pages 22--41, 2012. Google Scholar
Digital Library
- Cassandra. Bug 5064: Alter table when it includes collections makes cqlsh hang. https://issues.apache.org/jira/browse/CASSANDRA-5064.Google Scholar
- J. Chen, S. Ghanbari, F. Iorio, A. B. Hashemi, and C. Amza. Ensemble: A tool for performance modeling of applications in cloud data centers. In IEEE TRANSACTIONS ON CLOUD COMPUTING, SPECIAL ISSUE ON SCIENTIFIC CLOUD COMPUTING, 2015.Google Scholar
- P. M. Chen and B. D. Noble. When virtual is better than real. In HotOS, pages 133--138, 2001.Google Scholar
- T.-c. Chiueh, M. Conover, and B. Montague. Surreptitious deployment and execution of kernel agents in windows guests. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012), CCGRID '12, pages 507--514, Washington, DC, USA, 2012. IEEE Computer Society. Google Scholar
Digital Library
- J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX 2008 Annual Technical Conference on Annual Technical Conference, pages 1--14, 2008.Google Scholar
Digital Library
- I.-H. Chung and J. K. Hollingsworth. Automated cluster-based web service performance tuning. In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, HPDC '04, pages 36--44, Washington, DC, USA, 2004. IEEE Computer Society.Google Scholar
Digital Library
- C. Colohan. The Scariest Outage Ever. CMU SDI/ISTC Seminar Series. http://www.pdl.cmu.edu/SDI/2012/083012b.html, 2012.Google Scholar
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- J. Criswell, A. Lenharth, D. Dhurjati, and V. Adve. Secure virtual architecture: A safe execution environment for commodity operating systems. SIGOPS Oper. Syst. Rev., 41(6):351--366, Oct. 2007. Google Scholar
Digital Library
- L. Cui, B. Li, Y. Zhang, and J. Li. Hotsnap: A hot distributed snapshot system for virtual machine cluster. In Proceedings of the 27th International Conference on Large Installation System Administration, LISA'13, pages 59--73, Berkeley, CA, USA, 2013. USENIX Association.Google Scholar
- B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: High availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pages 161--174. San Francisco, 2008.Google Scholar
Digital Library
- Dave Gilbert. PostCopyLiveMigration. http://wiki.qemu.org/Features/PostCopyLiveMigration.Google Scholar
- Dave Gilbert. PostCopyLiveMigration. https://github.com/orbitfp7/qemu/tree/wp3-postcopy.Google Scholar
- D. J. Dean, H. Nguyen, X. Gu, H. Zhang, J. Rhee, N. Arora, and G. Jiang. Perfscope: Practical online server performance bug inference in production cloud computing infrastructures. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 8:1--8:13, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- Y. Diao, J. L. Hellerstein, S. Parekh, and J. P. Bigus. Managing web server performance with autotune agents. IBM Systems Journal, 42(1):136--149, 2003. Google Scholar
Digital Library
- B. Dolan-Gavitt, T. Leek, M. Zhivich, J. Giffin, and W. Lee. Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection. In IEEE Security and Privacy '11, pages 297--312.Google Scholar
- Y. Dong, W. Ye, Y. Jiang, I. Pratt, S. Ma, J. Li, and H. Guan. Colo: Coarse-grained lock-stepping virtual machines for non-stop service. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 3:1--3:16, New York, NY, USA, 2013. ACM.Google Scholar
Digital Library
- G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. Revirt: Enabling intrusion analysis through virtual-machine logging and replay. SIGOPS Oper. Syst. Rev., 36(SI):211--224, Dec. 2002.Google Scholar
- EMC. VNX Snapshots White Paper. https://www.emc.com/collateral/software/white-papers/h10858-vnx-snapshots-wp.pdf.Google Scholar
- Florian octo Forster. Collectd: The system statistics collection daemon. https://collectd.org/.Google Scholar
- Y. Fu and Z. Lin. Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection. In IEEE Security&Privacy'12.Google Scholar
- Y. Fu and Z. Lin. Exterior: Using a dual-vm based external shell for guest-os introspection, configuration, and recovery. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 97--110, 2013. Google Scholar
Digital Library
- Y. Fu, J. Zeng, and Z. Lin. Hypershell: A practical hypervisor layer guest os shell for automated in-vm management. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 85--96, 2014.Google Scholar
- A. Ganjali and D. Lie. Auditing cloud management using information flow tracking. In Proceedings of the Seventh ACM Workshop on Scalable Trusted Computing, STC '12, pages 79--84, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- T. Garfinkel and M. Rosenblum. A Virtual Machine Introspection Based Architecture for Intrusion Detection. In NDSS, pages 191--206, 2003.Google Scholar
- S. Ghanbari, A. B. Hashemi, and C. Amza. Stage-aware anomaly detection through tracking log points. In Proceedings of the 15th International Middleware Conference, Middleware '14, 2014. Google Scholar
Digital Library
- G. R. Goodson, S. Susarla, and K. Srinivasan. System and method for fast restart of a guest operating system in a virtual machine environment, Aug. 23 2011. US Patent 8,006,079.Google Scholar
- Z. Gu, Z. Deng, D. Xu, and X. Jiang. Process implanting: A new active introspection framework for virtualization. In Reliable Distributed Systems (SRDS), 2011 30th IEEE Symposium on, pages 147--156. IEEE, 2011. Google Scholar
Digital Library
- M. R. Hines and K. Gopalan. Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '09, pages 51--60, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- J. Hizver and T.-c. Chiueh. Real-time deep virtual machine introspection and its applications. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '14, pages 3--14, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- J. Humble and D. Farley. Continuous delivery: reliable software releases through build, test, and deployment automation. Pearson Education, 2010.Google Scholar
Digital Library
- IBM. BigFix / Endpoint Manager. https://github.com/bigfix/platform-releases.Google Scholar
- A. Kangarlou, P. Eugster, and D. Xu. Vnsnap: Taking snapshots of virtual networked environments with minimal downtime. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 524--533. IEEE, 2009. Google Scholar
Cross Ref
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of the annual conference on USENIX Annual Technical Conference, 2005.Google Scholar
Digital Library
- Konstantin Boudnik. Hadoop: Code Injection, Distributed Fault Injection. http://www.boudnik.org/~cos/docs/Hadoop-injection.pdf.Google Scholar
- T. Kurze, M. Klems, D. Bermbach, A. Lenk, S. Tai, and M. Kunze. Cloud federation. In Proceedings of the 2nd International Conference on Cloud Computing, GRIDs, and Virtualization, CLOUD COMPUTING 2011.Google Scholar
- H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. Snowflock: Rapid virtual machine cloning for cloud computing. In EuroSys, 2009.Google Scholar
Digital Library
- A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010. Google Scholar
Digital Library
- M. Le and Y. Tamir. Fault injection in virtualized systems-challenges and applications. Dependable and Secure Computing, IEEE Transactions on, 12(3):284--297, May 2015.Google Scholar
- Linux man page. Chroot. http://linux.die.net/man/1/chroot.Google Scholar
- Linux man page. chrt - manipulate real-time attributes of a process. http://linux.die.net/man/1/chrt.Google Scholar
- Matthew H. Intel SGX for Dummies (Intel SGX Design Objectives). https://software.intel.com/en-us/blogs/2013/09/26/protecting-application-secrets-with-intel-sgx.Google Scholar
- Microsoft Azure. VM Agent and Extensions. https://azure.microsoft.com/en-us/blog/vm-agent-and-extensions-part-2/.Google Scholar
- M. J. Mior and E. de Lara. Flurrydb: A dynamically scalable relational database with virtual machine cloning. In 4th Annual International Systems and Storage Conference, Haifa, Israel, May 2011. Google Scholar
Digital Library
- D. Mosberger and T. Jin. httperf - a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev., 26(3):31--37, 1998. Google Scholar
Digital Library
- Nemo. Abusing Mach on Mac OS X. http://uninformed.org/index.cgi?v=4&a=3.Google Scholar
- OpenBenchmarking/Phoronix. x264 Test Profile. http://openbenchmarking.org/test/pts/x264-1.7.0.Google Scholar
- OW2 Consortium. RUBiS: Rice University Bidding System. http://rubis.ow2.org/.Google Scholar
- Patrick Colp. VM Snapshots. http://www-archive.xenproject.org/files/xensummit_oracle09/VMSnapshots.pdf.Google Scholar
- PHP. Bug 45161 and 65458. https://bugs.php.net/bug.php?id=45161 and https://bugs.php.net/bug.php?id=65458.Google Scholar
- B. Procházka, T. Vojnar, and M. Drahansky. Hijacking the linux kernel. In MEMICS, pages 85--92, 2010.Google Scholar
- QEMU. Documentation/Debugging: Using gdb. http://wiki.qemu.org/Documentation/Debugging.Google Scholar
- Russell Coker. Bonnie++. http://www.coker.com.au/bonnie++/.Google Scholar
- A. Saberi, Y. Fu, and Z. Lin. Hybrid-bridge: Efficiently bridging the semantic-gap in vmi via decoupled execution and training memoization. In NDSS, 2014.Google Scholar
- A. Saboori, G. Jiang, and H. Chen. Autotuning configurations in distributed systems for performance improvements using evolutionary strategies. In Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems, ICDCS '08, pages 769--776, Washington, DC, USA, 2008. IEEE Computer Society. Google Scholar
Digital Library
- T.-I. Salomie, G. Alonso, T. Roscoe, and K. Elphinstone. Application level ballooning for efficient server consolidation. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 337--350, 2013. Google Scholar
Digital Library
- B. Satzger, W. Hummer, C. Inzinger, P. Leitner, and S. Dustdar. Winds of change: From vendor lock-in to the meta cloud. IEEE Internet Computing, 17(1):69--73, Jan. 2013. Google Scholar
Digital Library
- B. Shi, B. Li, L. Cui, J. Zhao, and J. Li. Syncsnap: Synchronized live memory snapshots of virtual machine networks. In 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pages 490--497, Aug 2014.Google Scholar
Digital Library
- L. M. Silva, J. Alonso, P. Silva, J. Torres, and A. Andrzejak. Using virtualization to improve software rejuvenation. In Network Computing and Applications, 2007. NCA 2007. Sixth IEEE International Symposium on, pages 33--44. IEEE, 2007. Google Scholar
Cross Ref
- D. Srinivasan and X. Jiang. Time-traveling forensic analysis of vm-based high-interaction honeypots. In Security and Privacy in Communication Networks, pages 209--226. 2012. Google Scholar
Cross Ref
- D. Srinivasan, Z. Wang, X. Jiang, and D. Xu. Process out-grafting: An efficient "out-of-vm" approach for fine-grained process execution monitoring. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS '11, pages 363--374, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- Stanley Cen. Mac OS X Code Injection and Reverse Engineering. http://stanleycen.com/blog/mac-osx-code-injection/.Google Scholar
- R. Sun, J. Yang, Z. Gao, and Z. He. Lsovc: A framework for taking live snapshot of virtual cluster in the cloud. In 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pages 1727--1732, Nov 2013. Google Scholar
Cross Ref
- S. Suneja, C. Isci, V. Bala, E. de Lara, and T. Mummert. Non-intrusive, out-of-band and out-of-the-box systems monitoring in the cloud. In The 2014 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14, pages 249--261, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- S. Suneja, C. Isci, E. de Lara, and V. Bala. Exploring vm introspection: Techniques and trade-offs. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '15, pages 133--146, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- Y. Tamura. Kemari: Fault tolerant vm synchronization based on kvm. 2010.Google Scholar
- Tim Starling. Measuring memory usage with strace. http://tstarling.com/blog/2010/06/measuring-memory-usage-with-strace/.Google Scholar
- Vasilis Liaskovitis, Igor Mammedov, et. al. ACPI memory hotplug. https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00734.html.Google Scholar
- N. Viennot, S. Nair, and J. Nieh. Transparent mutable replay for multicore debugging and patch validation. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 127--138, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- VMware. Guest Operating System Customization Requirements. https://pubs.vmware.com/vsphere-51/index.jsp#com.vmware.vsphere.vm_admin.doc/GUID-E63B6FAA-8D35-428D-B40C-744769845906.html.Google Scholar
- VMware. Understanding Clones. https://www.vmware.com/support/ws5/doc/ws_clone_overview.html.Google Scholar
- VMware. VMCI Sockets Documentation. www.vmware.com/support/developer/vmci-sdk/.Google Scholar
- VMware. VMWare Tools. http://kb.vmware.com/kb/340.Google Scholar
- S. Vogl, F. Kilic, C. Schneider, and C. Eckert. X-tier: Kernel module injection. In J. Lopez, X. Huang, and R. Sandhu, editors, Network and System Security, volume 7873 of Lecture Notes in Computer Science, pages 192--205. Springer Berlin Heidelberg, 2013.Google Scholar
- E. Warszawski and M. Ben-Yehuda. Fast initiation of workloads using memory-resident post-boot snapshots, Nov. 3 2015. US Patent App. 14/930,674.Google Scholar
- J. Wettinger, U. Breitenbücher, and F. Leymann. Standards-based devops automation and integration using tosca. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, UCC '14, pages 59--68, Washington, DC, USA, 2014. IEEE Computer Society. Google Scholar
Digital Library
- R. Wu, P. Chen, P. Liu, and B. Mao. System call redirection: A practical approach to meeting real-world virtual machine introspection needs. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on, pages 574--585, 2014. Google Scholar
Digital Library
- X. Wu, Z. Shen, R. Wu, and Y. Lin. Jump-start cloud: efficient deployment framework for large-scale cloud applications. Concurrency and Computation: Practice and Experience, 24(17):2120--2137, 2012. Google Scholar
Digital Library
- Xen Project Blog. Debugging on xen. https://blog.xenproject.org/2009/10/21/debugging-on-xen/.Google Scholar
- Xen Project Wiki. Blktap. http://wiki.xenproject.org/wiki/Blktap.Google Scholar
- Xen Project Wiki. Migration. http://wiki.xenproject.org/wiki/Migration.Google Scholar
- Xen.org: Sean Dague, Daniel Stekloff, Reiner Sailer, and Stefan Berger. Xen Management User Interface. http://xenbits.xen.org/docs/4.3-testing/man/xm.1.html#block_devices.Google Scholar
- Yasuaki Ishimatsu. Memory Hotplug. http://events.linuxfoundation.org/sites/events/files/lcjp13_ishimatsu.pdf.Google Scholar
- J. Zeng, Y. Fu, and Z. Lin. Pemu: A pin highly compatible out-of-vm dynamic binary instrumentation framework. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '15, pages 147--160, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- ZeroTurnaround. JRebel Java Plugin. http://zeroturnaround.com/software/jrebel/.Google Scholar
- F. Zhang, J. Cao, L. Liu, and C. Wu. Fast autotuning configurations of parameters in distributed computing systems using ordinal optimization. In Proceedings of the 2009 International Conference on Parallel Processing Workshops, ICPPW '09, pages 190--197, Washington, DC, USA, 2009. IEEE Computer Society. Google Scholar
Digital Library
- W. Zheng, R. Bianchini, G. J. Janakiraman, J. R. Santos, and Y. Turner. Justrunit: Experiment-based management of virtualized data centers. In Proc. USENIX Annual technical conference, pages 18--18, 2009.Google Scholar
- W. Zheng, R. Bianchini, and T. D. Nguyen. Automatic configuration of internet services. SIGOPS Oper. Syst. Rev., 41(3):219--229, Mar. 2007. Google Scholar
Digital Library
- J. Zhi, S. Suneja, and E. De Lara. The case for system testing with swift hierarchical vm fork. In Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, HotCloud'14, pages 19--19, 2014.Google Scholar
Digital Library
- J. Zhu, Z. Jiang, and Z. Xiao. Twinkle: A fast resource provisioning mechanism for internet services. In INFOCOM, 2011 Proceedings IEEE, pages 802--810, 2011. Google Scholar
Cross Ref
Index Terms
Safe Inspection of Live Virtual Machines
Recommendations
Safe Inspection of Live Virtual Machines
VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsWith DevOps automation and an everything-as-code approach to lifecycle management for cloud-native applications, challenges emerge from an operational visibility and control perspective. Once a VM is deployed in production it typically becomes a hands-...
Performance Analysis for Pareto-Optimal Green Consolidation Based on Virtual Machines Live Migration
Huge energy requirement of cloud data centers is prime concern. Dynamic Virtual Machine VM consolidation based on VM live migration to switched-off or put some of the under-loaded host Physical Machines PMs into a low power consumption mode can ...
Enabling Instantaneous Relocation of Virtual Machines with a Lightweight VMM Extension
CCGRID '10: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid ComputingWe are developing an efficient resource management system with aggressive virtual machine (VM) relocation among physical nodes in a data center. Existing live migration technology, however, requires a long time to change the execution host of a VM, it ...







Comments