Abstract
We present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. This feature, not available in previous record-replay systems, enables powerful new functionality. In particular, Dora can help reproduce, diagnose, and fix software bugs by replaying a version of a recorded application that is recompiled with debugging information, reconfigured to produce verbose log output, modified to include additional print statements, or patched to fix a bug.
Dora uses lightweight operating system mechanisms to record an application execution by capturing nondeterministic events to a log without imposing unnecessary timing and ordering constraints. It replays the log using a modified version of the application even in the presence of added, deleted, or modified operations that do not match events in the log. Dora searches for a replay that minimizes differences between the log and the replayed execution of the modified program. If there are no modifications, Dora provides deterministic replay of the unmodified program.
We have implemented a Linux prototype which provides transparent mutable replay without recompiling or relinking applications. We show that Dora is useful for reproducing, diagnosing, and fixing software bugs in real-world applications, including Apache and MySQL. Our results show that Dora (1) captures bugs and replays them with applications modified or reconfigured to produce additional debugging output for root cause diagnosis, (2) captures exploits and replays them with patched applications to validate that the patches successfully eliminate vulnerabilities, (3) records production workloads and replays them with patched applications to validate patches with realistic workloads, and (4) maintains low recording overhead on commodity multicore hardware, making it suitable for production systems.
- Apache Bug 53131. https://issues.apache.org/bugzilla/show_bug.cgi?id=53131.Google Scholar
- G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Nov. 2009. Google Scholar
Digital Library
- S. Bhansali, W.-K. Chen, S. de Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for Instructionlevel Tracing and Analysis of Program Executions. In Proceedings of the 2nd International Conference on Virtual Execution Environments (VEE '06), June 2006. Google Scholar
Digital Library
- E. W. Biederman. Multiple Instances of the Global Linux Namespaces. In Proceedings of the Linux Symposium, July 2006.Google Scholar
- T. C. Bressoud. TFT: A Software System for Application-Transparent Fault Tolerance. In Proceedings of the 28th International Symposium on Fault-Tolerant Computing (FTCS '98), June 1998. Google Scholar
Digital Library
- T. C. Bressoud and F. B. Schneider. Hypervisor-Based Fault Tolerance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP '95), Dec. 1995. Google Scholar
Digital Library
- J. Chow, T. Garfinkel, and P. M. Chen. Decoupling Dynamic Program Analysis from Execution in Virtual Environments. In Proceedings of the USENIX Annual Technical Conference (USENIX '08), June 2008. Google Scholar
Digital Library
- J. Chow, D. Lucchetti, T. Garfinkel, G. Lefebvre, R. Gardner, J. Mason, S. Small, and P. M. Chen. Multi-stage Replay with Crosscut. In Proceedings of the 6th International Conference on Virtual Execution Environments (VEE '10), Mar. 2010. Google Scholar
Digital Library
- G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Dec. 2002. Google Scholar
Digital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE '08), Mar. 2008. Google Scholar
Digital Library
- Z. Guo, X.Wang, J. Tang, X. Liu, Z. Xu, M.Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI '08), Dec. 2008. Google Scholar
Digital Library
- A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen. Detecting Past and Present Intrusions through Vulnerability-Specific Predicates. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP '05), Oct. 2005. Google Scholar
Digital Library
- I. Kravets and D. Tsafrir. Feasibility of Mutable Replay for Automated Regression Testing of Security Updates. In Workshop on Runtime Environments, Systems, Layering and Virtualized Environments (RESoLVE), March 2012.Google Scholar
- O. Laadan and J. Nieh. Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems. In Proceedings of the 2007 USENIX Annual Technical Conference, June 2007. Google Scholar
Digital Library
- O. Laadan, R. A. Baratto, D. Phung, S. Potter, and J. Nieh. DejaView: A Personal Virtual Computer Recorder. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP '07), Oct. 2007. Google Scholar
Digital Library
- O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '10), June 2010. Google Scholar
Digital Library
- O. Laadan, N. Viennot, C.-c. Tsai, C. Blinn, J. Yang, and J. Nieh. Pervasive Detection of Process Races in Deployed Systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Oct. 2011. Google Scholar
Digital Library
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, 36(4), Apr. 1987. Google Scholar
Digital Library
- S. Narayanasamy, Z.Wang, J. Tigani, A. Edwards, and B. Calder. Automatically Classifying Benign and Harmful Data Races Using Replay Analysis. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI '07), June 2007. Google Scholar
Digital Library
- S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Dec. 2002. Google Scholar
Digital Library
- S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Oct. 2009. Google Scholar
Digital Library
- Y. Saito. Jockey: a User-Space Library for Record-Replay Debugging. In Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging (AADEBUG '05), Sept. 2005. Google Scholar
Digital Library
- S. Sidiroglou, S. Ioannidis, and A. D. Keromytis. Band-aid Patching. In Proceedings of the 3rd workshop on on Hot Topics in System Dependability (HotDep '07), June 2007. Google Scholar
Digital Library
- S. Sidiroglou, O. Laadan, C. Perez, N. Viennot, J. Nieh, and A. D. Keromytis. ASSURE: Automatic Software Self-healing Using REscue points. In Proceedings of the 14th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), Mar. 2009. Google Scholar
Digital Library
- J. H. Slye and E. Elnozahy. Supporting Nondeterministic Execution in Fault-Tolerant Systems. In Proceedings of the 26th International Symposium on Fault-Tolerant Computing (FTCS '96), June 1996. Google Scholar
Digital Library
- S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the USENIX Annual Technical Conference (USENIX '04), June 2004. Google Scholar
Digital Library
- D. Subhraveti and J. Nieh. Record and Transplay: Partial Checkpointing for Replay Debugging Across Heterogeneous Systems. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '11), June 2011. Google Scholar
Digital Library
- C. Tang. DSF: A Common Platform for Distributed Systems Research and Development. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware '09), Nov. 2009. Google Scholar
Digital Library
- J. Tucek, W. Xiong, and Y. Zhou. Efficient Online Validation With Delta Execution. In Proceedings of the 14th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), Mar. 2009. Google Scholar
Digital Library
Index Terms
Transparent mutable replay for multicore debugging and patch validation
Recommendations
Transparent mutable replay for multicore debugging and patch validation
ASPLOS '13We present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. This feature, not available in previous record-replay systems, enables powerful new ...
Transparent mutable replay for multicore debugging and patch validation
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsWe present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. This feature, not available in previous record-replay systems, enables powerful new ...
Interactive record/replay for web application debugging
UIST '13: Proceedings of the 26th annual ACM symposium on User interface software and technologyDuring debugging, a developer must repeatedly and manually reproduce faulty behavior in order to inspect different facets of the program's execution. Existing tools for reproducing such behaviors prevent the use of debugging aids such as breakpoints and ...







Comments