skip to main content
research-article
Public Access

Abstractions for Practical Virtual Machine Replay

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Efficient deterministic replay of whole operating systems is feasible and useful, so why isn't replay a default part of the software stack? While implementing deterministic replay is hard, we argue that the main reason is the lack of general abstractions for understanding and addressing the significant engineering challenges involved in the development of a replay engine for a modern VMM. We present a design blueprint---a set of abstractions, general principles, and low-level implementation details---for efficient deterministic replay in a modern hypervisor. We build and evaluate our architecture in Xen, a full-featured hypervisor. Our architecture can be readily followed and adopted, enabling replay as a ubiquitous part of a modern virtualization stack.

References

  1. G. Altekar and I. Stoica. ODR: Output-deterministic replay for multicore debugging. In Proc. SOSP, pages 193--206, Oct. 2009. 10.1145/1629575.1629594.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amazon Web Services, Inc. Amazon EC2 -- virtual server hosting, 2016. URL https://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  3. 0AMD Corporation. AMD64 Architecture Programmer's Manual Volume 2: System Programming, 2007.Google ScholarGoogle Scholar
  4. M. Attariyan, M. Chow, and J. Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In Proc. OSDI, Oct. 2012. URL https://www.usenix.org/conference/osdi12/technical-sessions/presentation/attariyan.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proc. SOSP, pages 164--177, Oct. 2003. 10.1145/945445.945462.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. F. Bartlett. A nonstop kernel. In Proc. SOSP, pages 22--29, Dec. 1981. 10.1145/800216.806587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In Proc. OSDI, pages 177--192, Oct. 2010. URL https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Bergan.pdf.Google ScholarGoogle Scholar
  8. A. Borg, J. Baumbach, and S. Glazer. A message system supporting fault tolerance. In Proc. SOSP, pages 90--99, Oct. 1983. 10.1145/773379.806617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. C. Bressoud and F. B. Schneider. Hypervisor-based fault-tolerance. In Proc. SOSP, pages 1--11, Dec. 1995. 10.1145/224056.224058.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. A. Cargill and B. N. Locanthi. Cheap hardware support for software debugging and profiling. In Proc. ASPLOS, pages 82--83, Oct. 1987. 10.1145/36177.36187.Google ScholarGoogle Scholar
  11. A. Chen, W. B. Moore, H. Xiao, A. Haeberlen, L. T. X. an, M. Sherr, and W. Zhou. Detecting covert timing channels with time-deterministic replay. In Proc. OSDI, pages 541--554, Oct. 2014. URL https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chen_ang.Google ScholarGoogle Scholar
  12. Y. Chen and H. Chen. Scalable deterministic replay in a parallel full-system emulator. In Proc. PPoPP, pages 207--218, Feb. 2013. 10.1145/2442516.2442537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Chen, S. Zhang, Q. Guo, L. Li, R. Wu, and T. Chen. Deterministic replay: A survey. ACM Comput. Surv., 48 (2), Nov. 2015. 10.1145/2790077.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Chisnall. The Definitive Guide to the Xen Hypervisor. Prentice Hall, first edition, 2007. ISBN 978-0132349710.Google ScholarGoogle Scholar
  15. J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In Proc. USENIX ATC, pages 1--14, June 2008. URL https://www.usenix.org/legacy/event/usenix08/tech/full_papers/chow/chow.pdf.Google ScholarGoogle Scholar
  16. F. Cornelis, A. Georges, M. Christiaens, M. Ronsse, T. Ghesquiere, and K. D. Bosschere. A taxonomy of execution replay systems. In International Conference on Advances in Infrastructure for Electronic Business, Education, Science, Medicine, and Mobile Technologies on the Internet, 2003.Google ScholarGoogle Scholar
  17. F. Cornelis, M. Ronsse, and K. De Bosschere. TORNADO: A novel input replay tool. In Proc. PDPTA, 2003\natexlabb.Google ScholarGoogle Scholar
  18. R. Curtis and L. D. Wittie. BUGNET: A debugging system for parallel programming environments. In Proc. ICDCS, pages 394--400, Oct. 1982.Google ScholarGoogle Scholar
  19. D. A. S. de Oliveira, J. R. Crandall, G. Wassermann, S. F. Wu, Z. Su, and F. T. Chong. ExecRecorder: VM-based full-system replay for attack analysis and system recovery. In Proc. 1st Workshop on Architectural and System Support for Improving Software Dependability, pages 66--71, Oct. 2006. 10.1145/1181309.1181320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Dionne, M. Feeley, and J. Desbiens. A taxonomy of distributed debuggers based on execution replay. In Proc. PDPTA, Aug. 1996.Google ScholarGoogle Scholar
  21. G. Dunlap. Personal communication, 2012.Google ScholarGoogle Scholar
  22. G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proc. OSDI, pages 211--224, Dec. 2002. URL https://www.usenix.org/legacy/event/osdi02/tech/dunlap.html.Google ScholarGoogle ScholarCross RefCross Ref
  23. G. W. Dunlap, D. G. Lucchetti, P. M. Chen, and M. A. Fetterman. Execution replay for multiprocessor virtual machines. In Proc. VEE, Mar. 2008. 10.1145/1346256.1346273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. W. Dunlap III. Execution Replay for Intrusion Analysis. D thesis, University of Michigan, 2006.Google ScholarGoogle Scholar
  25. K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the Xen virtual machine monitor. In Proc. 1st Workshop on Operating System and Architectural Support for the On Demand IT Infrastructure (OASIS), Oct. 2004. URL https://www.cl.cam.ac.uk/research/srg/netos/papers/2004-safehw-oasis.pdf.Google ScholarGoogle Scholar
  26. D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay debugging for distributed applications. In Proc. USENIX ATC, pages 289--300, May--June 2006. URL https://www.usenix.org/legacy/events/usenix06/tech/geels.html.Google ScholarGoogle Scholar
  27. Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In Proc. OSDI, pages 193--208, Dec. 2008. URL https://www.usenix.org/legacy/events/osdi08/tech/full_papers/guo/guo.pdf.Google ScholarGoogle Scholar
  28. A. Haeberlen, P. Aditya, R. Rodrigues, and P. Druschel. Accountable virtual machines. In Proc. OSDI, pages 119--134, Oct. 2010. URL https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Haeberlen.pdf.Google ScholarGoogle Scholar
  29. N. Honarmand and J. Torrellas. RelaxReplay: Record and replay for relaxed-consistency multiprocessors. In Proc. ASPLOS, Mar. 2014. 10.1145/2541940.2541979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Huselius. Debugging parallel systems: A state of the art report. MTRC Report 63, Malardalens University, Vaster's, Sweden, Sept. 2002. URL http://www.es.mdh.se/publications/366-Debugging_Parallel_Systems__A_State_of_the_Art_Report.Google ScholarGoogle Scholar
  31. Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3 (3A, 3B, 3C, and 3D): System Programming Guide, 2015.Google ScholarGoogle Scholar
  32. S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proc. USENIX ATC, pages 1--15, Apr. 2005. URL https://www.usenix.org/legacy/events/usenix05/tech/general/king.html.Google ScholarGoogle Scholar
  33. O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proc. SIGMETRICS, pages 155--166, June 2010. 10.1145/1811039.1811057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. B. Leeman, Jr. A formal approach to undo operations in programming languages. ACM TOPLAS, 8 (1): 50--87, Jan. 1986. 10.1145/5001.5005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Lefebvre, B. Cully, C. Head, M. Spear, N. Hutchinson, M. Feeley, and A. Warfield. Execution mining. In Proc. VEE, pages 145--158, Mar. 2012. 10.1145/2151024.2151044.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. M. Mellor-Crummey and T. J. LeBlanc. A software instruction counter. In Proc. ASPLOS, pages 78--86, Apr. 1989. 10.1145/70082.68189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mozilla Foundation. rr: lightweight recording & deterministic debugging, Feb. 2016. URL http://rr-project.org/.Google ScholarGoogle Scholar
  38. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic replay with execution sketching on multiprocessors. In Proc. SOSP, pages 177--192, Oct. 2009. 10.1145/1629575.1629593.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Russell. virtio: Towards a de-facto standard for virtual I/O devices. ACM SIGOPS OSR, 42 (5): 95--103, July 2008. 10.1145/1400097.1400108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Saito. Jockey: A user-space library for record-replay debugging\balance. In Proc. AADEBUG, pages 69--76, Sept. 2005. 10.1145/1085130.1085139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A lightweight extension for rollback and deterministic replay for software debugging. In Proc. USENIX ATC, pages 29--44, June--July 2004. URL https://www.usenix.org/legacy/event/usenix04/tech/general/srinivasan.html.Google ScholarGoogle Scholar
  42. G. Venkitachalam, M. Nelson, B. Weissman, M. Xu, and V. V. Malyugin. Using branch instruction counts to facilitate replay of virtual machine instruction execution. U.S. patent 7,844,954, Nov. 2010.Google ScholarGoogle Scholar
  43. VMware. VMware vSere 4 Fault Tolerance: Architecture and performance. White paper, Aug. 2009. URL https://www.vmware.com/resources/techresources/10058.Google ScholarGoogle Scholar
  44. VMware. Protecting Hadoop with VMware vSere 5 Fault Tolerance. Technical white paper, Aug. 2012. URL https://www.vmware.com/resources/techresources/10301.Google ScholarGoogle Scholar
  45. VMware. VMware vSere 6 Fault Tolerance: Architecture and performance. Technical white paper, Dec. 2015. URL https://www.vmware.com/resources/techresources/10514.Google ScholarGoogle Scholar
  46. B. Weissman, V. V. Malyugin, P. Vandrovec, G. Venkitachalam, and M. Xu. Precise branch counting in virtualization systems. U.S. patent 9,027,003, May 2015.Google ScholarGoogle Scholar
  47. B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. In Proc. OSDI, pages 255--270, Dec. 2002. URL https://www.usenix.org/legacy/event/osdi02/tech/white.html.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B. Weissman. ReTrace: Collecting execution trace with virtual machine deterministic replay. In Proc. 3rd Annual Workshop on Modeling, Benchmarking and Simulation, June 2007. URL https://labs.vmware.com/academic/publications/retrace.Google ScholarGoogle Scholar

Index Terms

  1. Abstractions for Practical Virtual Machine Replay

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!