Abstract
Logging and replay is important to reproducing software failures and recovering from failures. Replaying a long execution is time consuming, especially when replay is further integrated with runtime techniques that require expensive instrumentation, such as dependence detection. In this paper, we propose a technique to reduce a replay log while retaining its ability to reproduce a failure. While traditional logging records only system calls and signals, our technique leverages the compiler to selectively collect additional information on the fly. Upon a failure, the log can be reduced by analyzing itself. The collection is highly optimized. The additional runtime overhead of our technique, compared to a plain logging tool, is trivial (2.61% average) and the size of additional log is comparable to the original log. Substantial reduction can be cost-effectively achieved through a search based algorithm. The reduced log is guaranteed to reproduce the failure.
- G. Altekar and I. Stoica. Odr: output-deterministic replay for multicore debugging. In SOSP'09. Google Scholar
Digital Library
- A. Ayers, R. Schooler, C. Metcalf, A. Agarwal, J. Rhee, and E. Witchel. Traceback: first fault diagnosis by reconstruction of distributed control flow. In PLDI'05. Google Scholar
Digital Library
- G. Bronevetsky, D. Marques, K. Pingali, and R. Rugina. Compiler-enhanced incremental checkpointing. In LCPC'07.Google Scholar
- K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63--75, 1985. Google Scholar
Digital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08. Google Scholar
Digital Library
- Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In OSDI'08. Google Scholar
Digital Library
- D. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multiprocessor replay. Communications of the ACM, 52(6):93--100, 2009. Google Scholar
Digital Library
- D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA'08. Google Scholar
Digital Library
- P. Joshi, C. S. Park, K. Sen, and M. Naik. A randomized dynamic program analysis technique for detecting real deadlocks. In PLDI'09. Google Scholar
Digital Library
- S. Joshi and A. Orso. Scarpe: A technique and tool for selective capture and replay of program executions. In ICSM'07.Google Scholar
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In USENIX ATEC'05. Google Scholar
Digital Library
- P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: a software-hardware interface for practical deterministic multiprocessor replay. In ASPLOS'09. Google Scholar
Digital Library
- M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI'07. Google Scholar
Digital Library
- S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS'06. Google Scholar
Digital Library
- R. H. B. Netzer and M. H. Weaver. Optimal tracing and incremental reexecution for debugging long-running programs. In PLDI'94. Google Scholar
Digital Library
- S. Park, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Pres: Probabilistic replay with execution sketching on multiprocessors. In SOSP'09. Google Scholar
Digital Library
- M. Ronsse, K. D. Bosschere, M. Christiaens, J. C. d. Kergommeaux, and D. Kranzlmüller. Record/replay for nondeterministic program executions. Communications of the ACM, 46(9):62--67, 2003. Google Scholar
Digital Library
- Y. Saito. Jockey: a user-space library for record-replay debugging. In AADEBUG'05. Google Scholar
Digital Library
- S. Tallam, C. Tian, X. Zhang, and R. Gupta. Enabling tracing of long-running multithreaded programs via dynamic execution reduction. In ISSTA'07. Google Scholar
Digital Library
- L. D. Wittie. Debugging distributed c programs by real time reply. In PADD'88. Google Scholar
Digital Library
- M. Wu, F. Long, X. Wang, Z. Xu, H. Lin, X. Liu, Z. Guo, H. Guo, L. Zhou, and Z. Zhang. Language-based replay via data flow cut. In FSE'10. Google Scholar
Digital Library
- G. Xu, A. Rountev, Y. Tang, and F. Qin. Efficient checkpointing of java software using context-sensitive capture and replay. In FSE'07. Google Scholar
Digital Library
- R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. Mpiwiz: subgroup reproducible replay of mpi applications. In PPOPP'09. Google Scholar
Digital Library
- A. Zeller. Isolating cause-effect chains from computer programs. In FSE'02. Google Scholar
Digital Library
- X. Zhang, S. Tallam, and R. Gupta. Dynamic slicing long running programs through execution fast forwarding. In FSE'06.\endthebibliography Google Scholar
Digital Library
Index Terms
Toward generating reducible replay logs
Recommendations
Toward generating reducible replay logs
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationLogging and replay is important to reproducing software failures and recovering from failures. Replaying a long execution is time consuming, especially when replay is further integrated with runtime techniques that require expensive instrumentation, ...
Infrastructure-free logging and replay of concurrent execution on multiple cores
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingWe develop a logging and replay technique for real concurrent execution on multiple cores. Our technique directly works on binaries and does not require any hardware or complex software infrastructure support. We focus on minimizing logging overhead as ...
Infrastructure-free logging and replay of concurrent execution on multiple cores
PPoPP '14We develop a logging and replay technique for real concurrent execution on multiple cores. Our technique directly works on binaries and does not require any hardware or complex software infrastructure support. We focus on minimizing logging overhead as ...







Comments