skip to main content
research-article

Transparent mutable replay for multicore debugging and patch validation

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

We present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. This feature, not available in previous record-replay systems, enables powerful new functionality. In particular, Dora can help reproduce, diagnose, and fix software bugs by replaying a version of a recorded application that is recompiled with debugging information, reconfigured to produce verbose log output, modified to include additional print statements, or patched to fix a bug.

Dora uses lightweight operating system mechanisms to record an application execution by capturing nondeterministic events to a log without imposing unnecessary timing and ordering constraints. It replays the log using a modified version of the application even in the presence of added, deleted, or modified operations that do not match events in the log. Dora searches for a replay that minimizes differences between the log and the replayed execution of the modified program. If there are no modifications, Dora provides deterministic replay of the unmodified program.

We have implemented a Linux prototype which provides transparent mutable replay without recompiling or relinking applications. We show that Dora is useful for reproducing, diagnosing, and fixing software bugs in real-world applications, including Apache and MySQL. Our results show that Dora (1) captures bugs and replays them with applications modified or reconfigured to produce additional debugging output for root cause diagnosis, (2) captures exploits and replays them with patched applications to validate that the patches successfully eliminate vulnerabilities, (3) records production workloads and replays them with patched applications to validate patches with realistic workloads, and (4) maintains low recording overhead on commodity multicore hardware, making it suitable for production systems.

References

  1. Apache Bug 53131. https://issues.apache.org/bugzilla/show_bug.cgi?id=53131.Google ScholarGoogle Scholar
  2. G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Bhansali, W.-K. Chen, S. de Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for Instructionlevel Tracing and Analysis of Program Executions. In Proceedings of the 2nd International Conference on Virtual Execution Environments (VEE '06), June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. W. Biederman. Multiple Instances of the Global Linux Namespaces. In Proceedings of the Linux Symposium, July 2006.Google ScholarGoogle Scholar
  5. T. C. Bressoud. TFT: A Software System for Application-Transparent Fault Tolerance. In Proceedings of the 28th International Symposium on Fault-Tolerant Computing (FTCS '98), June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. C. Bressoud and F. B. Schneider. Hypervisor-Based Fault Tolerance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP '95), Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chow, T. Garfinkel, and P. M. Chen. Decoupling Dynamic Program Analysis from Execution in Virtual Environments. In Proceedings of the USENIX Annual Technical Conference (USENIX '08), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chow, D. Lucchetti, T. Garfinkel, G. Lefebvre, R. Gardner, J. Mason, S. Small, and P. M. Chen. Multi-stage Replay with Crosscut. In Proceedings of the 6th International Conference on Virtual Execution Environments (VEE '10), Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE '08), Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Guo, X.Wang, J. Tang, X. Liu, Z. Xu, M.Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI '08), Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen. Detecting Past and Present Intrusions through Vulnerability-Specific Predicates. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP '05), Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Kravets and D. Tsafrir. Feasibility of Mutable Replay for Automated Regression Testing of Security Updates. In Workshop on Runtime Environments, Systems, Layering and Virtualized Environments (RESoLVE), March 2012.Google ScholarGoogle Scholar
  14. O. Laadan and J. Nieh. Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems. In Proceedings of the 2007 USENIX Annual Technical Conference, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Laadan, R. A. Baratto, D. Phung, S. Potter, and J. Nieh. DejaView: A Personal Virtual Computer Recorder. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP '07), Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '10), June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Laadan, N. Viennot, C.-c. Tsai, C. Blinn, J. Yang, and J. Nieh. Pervasive Detection of Process Races in Deployed Systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, 36(4), Apr. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Narayanasamy, Z.Wang, J. Tigani, A. Edwards, and B. Calder. Automatically Classifying Benign and Harmful Data Races Using Replay Analysis. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI '07), June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Saito. Jockey: a User-Space Library for Record-Replay Debugging. In Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging (AADEBUG '05), Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Sidiroglou, S. Ioannidis, and A. D. Keromytis. Band-aid Patching. In Proceedings of the 3rd workshop on on Hot Topics in System Dependability (HotDep '07), June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Sidiroglou, O. Laadan, C. Perez, N. Viennot, J. Nieh, and A. D. Keromytis. ASSURE: Automatic Software Self-healing Using REscue points. In Proceedings of the 14th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. H. Slye and E. Elnozahy. Supporting Nondeterministic Execution in Fault-Tolerant Systems. In Proceedings of the 26th International Symposium on Fault-Tolerant Computing (FTCS '96), June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the USENIX Annual Technical Conference (USENIX '04), June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Subhraveti and J. Nieh. Record and Transplay: Partial Checkpointing for Replay Debugging Across Heterogeneous Systems. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '11), June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Tang. DSF: A Common Platform for Distributed Systems Research and Development. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware '09), Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Tucek, W. Xiong, and Y. Zhou. Efficient Online Validation With Delta Execution. In Proceedings of the 14th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transparent mutable replay for multicore debugging and patch validation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 4
        ASPLOS '13
        April 2013
        540 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499368
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
          March 2013
          574 pages
          ISBN:9781450318709
          DOI:10.1145/2451116

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 March 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!