skip to main content
research-article

Optimizing crash dump in virtualized environments

Authors Info & Claims
Published:17 March 2010Publication History
Skip Abstract Section

Abstract

Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems.

In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM.

We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.

References

  1. Critix Inc. XenServer. http://www.citrix.com, 2009.Google ScholarGoogle Scholar
  2. VMware Inc. VMware ESX Server. http://www.vmware.com/products/esx/index.html, 2009.Google ScholarGoogle Scholar
  3. Ganapathi, A. and Patterson, D. Crash Data Collection: A Windows Case Study. In Proceedings of Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 280--285, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. GNU. The GNU Project Debugger. http://www.gnu.org/software/gdb/, 2009.Google ScholarGoogle Scholar
  5. Ganapathi, A. and Ganapathi, V. and Patterson, D. Windows XP kernel crash analysis. In Proceedings of Usenix Large Installation System Administration Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David Anderson. White Paper: Red Hat Crash Utility. http://people.redhat.com/anderson/crash whitepaper/, 2008.Google ScholarGoogle Scholar
  7. Goyal, V. and Biederman, E. and Nellitheertha, H. Kdump, A Kexec-based Kernel Crash Dumping Mechanism. In Proceedings of Annual Ottawa Linux Symposium, pages 169--180, 2005.Google ScholarGoogle Scholar
  8. Barham, P. and Dragovic, B. and Fraser, K. and Hand, S. and Harris, T. and Ho, A. and Neugebauer, R. and Pratt, I. and Warfield, A. Xen and the Art of Virtualization. In Proceedings of ACM Symposium on Operating Systems Principles, pages 164--177, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mauro, J. and Zhu, J. and Pramanick, I. The system recovery benchmark. In Proceedings of Pacific Rim International Symposium on Dependable Computing, pages 271--280, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Patterson, D. and Brown, A. and Broadwell, P. and Candea, G. and Chen, M. and Culter, J. and Enriquez, P. and Fox, A. and Kcman, E. and Merzbacher, M. and Oppenheimer, D. and Sastry, N. and Tetzlaff, W. and Traupman, J. and Treuhaft, N. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Computer Science Division, U.C. Berkeley, UCB//CSD-02-1175, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fox, A. and Patterson, D. When Does Fast Recovery Trump High Reliability? In Proceedings of 2nd Workshop on Evaluating and Architecting System Dependability, 2002.Google ScholarGoogle Scholar
  12. Candea, G. and Kawamoto, S. and Fujiki, Y. and Friedman, G. and Fox, A. Microreboot -- A Technique for Cheap Recovery. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation, pages 31--44, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Candea, G. and Fox, A. Crash-only Software. In Proceedings of Workshop on Hot Topics in Operating Systems, pages 67--72, 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Baker, M. and Sullivan, M. The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment. In Proceeding of Summer USENIX Technical Conference, 1992.Google ScholarGoogle Scholar
  15. Bird, T. Methods to Improve Bootup Time in Linux. In Proceeding of Ottawa Linux Symposium, pages 79--88, 2004.Google ScholarGoogle Scholar
  16. Padala, P. and Hou, K. and Shin, K. and Zhu, X. and Uysal, M. and Wang, Z. and Singhal, S. and Merchant, A. Automated Control of Multiple Virtualized Resources. In Proceedings of European Conference on Computer Systems, pages 13--26, year 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Waldspurger, C. Memory Resource Management in VMware ESX Server. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation, pages 181--194, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhao, W. and Wang, Z. Dynamic Memory Balancing for Virtual Machines. In Proceedings of ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 21--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Garfinkel, T. and Rosenblum, M. A Virtual Machine Introspection Based Architecture for Intrusion Detection. In Proceedings of Annual Network & Distributed System Security Conference, pages 191--206, 2003.Google ScholarGoogle Scholar
  20. Jones, S. and Arpaci-Dusseau, A. and Arpaci-Dusseau, R. Antfarm: Tracking Processes in a Virtual Machine Environment. Proceeding of Usenix Annual Technical Conference, pages 1--14, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chen, X. and Garfinkel, T. and Christopher Lewis, E. and Subrahmanyam, P. and Waldspurger, C. and Boneh, D. and Dwoskin, J. and Ports, D. Overshadow: A Virtualization-Based Approach to Retrofitting Protection in Commodity Operating Systems. In Proceeding of International Conference on Architectural Support for Programming Languages and Operating Systems, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jones, S. and Arpaci-Dusseau, A. and Arpaci-Dusseau, R. VMM-based Hidden Process Detection and Identification Using Lycosid. Proceedings of ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 91--100, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nance, K. and Bishop, M. and Hay, B. Virtual Machine Introspection: Observation or Interference. In Proceeding of IEEE Symposium on Security and Privacy, pages 32--37, 2008.Google ScholarGoogle Scholar
  24. Zhang, Y. and Bestavros, A. and Guirguis, M. and Matta, I. and West, R. Friendly Virtual Machines: Leveraging a Feedback-control Model for Application Adaptation. In Proceedings of ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 2--12, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Transaction Processing Performance Council. The TPC-W Benchmark. http://www.tpc.org/tpcw/default.asp, 2009.Google ScholarGoogle Scholar
  26. Bezenek, T. and Cain, T. and Dickson, R. and Heil, T. and Martin, M. and McCurdy, C. and Rajwar, R. and Weglarz, E. and Zilles, C. and Lipasti, M. Characterizing a Java implementation of TPC-W. In Third Workshop On Computer Architecture Evaluation Using Commercial Workloads, 2000.Google ScholarGoogle Scholar

Index Terms

  1. Optimizing crash dump in virtualized environments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 45, Issue 7
          VEE '10
          July 2010
          161 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1837854
          Issue’s Table of Contents
          • cover image ACM Conferences
            VEE '10: Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
            March 2010
            176 pages
            ISBN:9781605589107
            DOI:10.1145/1735997

          Copyright © 2010 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 March 2010

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!