Abstract
Debugging concurrent programs is known to be difficult due to scheduling non-determinism. The technique of multiprocessor deterministic replay substantially assists debugging by making the program execution reproducible. However, facing the huge replay traces and long replay time, the debugging task remains stunningly challenging for long running executions. We present a new technique, LEAN, on top of replay, that significantly reduces the complexity of the replay trace and the length of the replay time without losing the determinism in reproducing concurrency bugs. The cornerstone of our work is a redundancy criterion that characterizes the redundant computation in a buggy trace. Based on the redundancy criterion, we have developed two novel techniques to automatically identify and remove redundant threads and instructions in the bug reproduction execution. Our evaluation results with several real world concurrency bugs in large complex server programs demonstrate that LEAN is able to reduce the size, the number of threads, and the number of thread context switches of the replay trace by orders of magnitude, and accordingly greatly shorten the replay time.
- Gautam Altekar and Ion Stoica. ODR: output deterministic replay for multicore debugging. In SOSP, 2009. Google Scholar
Digital Library
- Jong-Deok Choi and Harini Srinivasan. Deterministic replay of java multithreaded applications. In SPDT, 1998. Google Scholar
Digital Library
- Jong-Deok Choi and Andreas Zeller. Isolating failure-inducing thread schedules. In ISSTA, 2002. Google Scholar
Digital Library
- William R. Dieter and James E. Lumpp Jr. A user-level checkpointing library for posix threads programs. In FTCS, 1999. Google Scholar
Digital Library
- George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008. Google Scholar
Digital Library
- Eitan Farchi, Yarden Nir, and Shmuel Ur. Concurrent bug patterns and how to test them. IPDPS, 2003. Google Scholar
Digital Library
- Dennis Giffhorn and Christian Hammer. Precise slicing of concurrent programs. Automated Software Engg., 2009. Google Scholar
Digital Library
- Derek R. Hower and Mark D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA, 2008. Google Scholar
Digital Library
- Jeff Huang, Peng Liu, and Charles Zhang. LEAP: Lightweight deterministic multi-processor replay of concurrent Java programs. In FSE, 2010. Google Scholar
Digital Library
- Jeff Huang and Charles Zhang. An efficient static trace simplification technique for debugging concurrent programs. In SAS, 2011. Google Scholar
Digital Library
- Nicholas Jalbert and Koushik Sen. A trace simplification technique for effective debugging of concurrent programs. In FSE, 2010. Google Scholar
Digital Library
- Jens Krinke. Context-sensitive slicing of concurrent programs. In ESEC/FSE, 2003. Google Scholar
Digital Library
- Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In ASPLOS, 2010. Google Scholar
Digital Library
- Kyu Hyung Lee, Yunhui Zheng, Nick Sumner, and Xiangyu Zhang. Toward generating reducible replay logs. In PLDI, 2011. Google Scholar
Digital Library
- Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. ASPLOS, 2008. Google Scholar
Digital Library
- Ghassan Misherghi and Zhendong Su. Hdd: hierarchical delta debugging. In ICSE, 2006. Google Scholar
Digital Library
- Pablo Montesinos, Luis Ceze, and Josep Torrellas. Delorean: Recording and deterministically replaying shared-memory multi-processor execution efficiently. In ISCA, 2008. Google Scholar
Digital Library
- Pablo Montesinos, Matthew Hicks, Samuel T. King, and Josep Torrellas. Capo: a software-hardware interface for practical deterministic multi-processor replay. In ASPLOS, 2009. Google Scholar
Digital Library
- Madan Musuvathi and Shaz Qadeer. Chess: systematic stress testing of concurrent software. In Proceedings of the 16th international conference on Logic-based program synthesis and transformation, 2007. Google Scholar
Digital Library
- Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gérard Basler, Piramanayagam A. Nainar, and Iulian Neamtiu. Finding and reproducing heisenbugs in concurrent programs. In OSDI, 2008. Google Scholar
Digital Library
- Mangala Gowri Nanda and S. Ramesh. Interprocedural slicing of multithreaded programs with applications to java. ACM Trans. Program. Lang. Syst., 2006. Google Scholar
Digital Library
- Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. PRES: probabilistic replay with execution sketching on multi-processors. In SOSP, 2009. Google Scholar
Digital Library
- Venkatesh Prasad Ranganath and John Hatcliff. Slicing concurrent java programs using indus and kaveri. Int. J. Softw. Tools Technol. Transf., 2007. Google Scholar
Digital Library
- Koushik Sen. Race directed random testing of concurrent programs. In PLDI, 2008. Google Scholar
Digital Library
- John Steven, Pravir Ch, Bob Fleck, and Andy Podgurski. jrapture: A capture/replay tool for observation-based testing. In ISSTA, 2000. Google Scholar
Digital Library
- Sriraman Tallam, Chen Tian, and Rajiv Gupta. Dynamic slicing of multithreaded programs for race detection. In ICSM, pages 97--106, 2008.Google Scholar
Cross Ref
- Sriraman Tallam, Chen Tian, Rajiv Gupta, and Xiangyu Zhang. Enabling tracing of long-running multithreaded programs via dynamic execution reduction. In ISSTA, 2007. Google Scholar
Digital Library
- Frank Tip. A survey of program slicing techniques. Journal of Programming Languages, 1995.Google Scholar
- Mandana Vaziri, Frank Tip, and Julian Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006. Google Scholar
Digital Library
- Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. Doubleplay: parallelizing sequential logging and replay. In ASPLOS, 2011. Google Scholar
Digital Library
- Christoph von Praun and Thomas R. Gross. Object race detection. In OOPSLA, 2001. Google Scholar
Digital Library
- Dasarath Weeratunge, Xiangyu Zhang, and Suresh Jagannathan. Analyzing multicore dumps to facilitate concurrency bug reproduction. In ASPLOS, 2010. Google Scholar
Digital Library
- Dasarath Weeratunge, Xiangyu Zhang, William N. Sumner, and Suresh Jagannathan. Analyzing concurrency bugs using dual slicing. In ISSTA, 2010. Google Scholar
Digital Library
- Mark Weiser. Program slicing. In TSE, 1984. Google Scholar
Digital Library
- K. Whisnant, Z. Kalbarczyk, and R. K. Iyer. Micro-checkpointing: Checkpointing for multithreaded applications. In IOLTW, 2000. Google Scholar
Digital Library
- Bin Xin, William N. Sumner, and Xiangyu Zhang. Efficient program execution indexing. In PLDI, 2008. Google Scholar
Digital Library
- Cristian Zamfir and George Candea. Execution synthesis: a technique for automated software debugging. In EuroSys, 2010. Google Scholar
Digital Library
- Andreas Zeller and Ralf Hildebrandt. Simplifying and isolating failure-inducing input. TSE, 2002. Google Scholar
Digital Library
- Xiangyu Zhang and Rajiv Gupta. Cost effective dynamic program slicing. In PLDI, 2004. Google Scholar
Digital Library
- Lukasz Ziarek and Suresh Jagannathan. Lightweight checkpointing for concurrent ml. J. Funct. Program., 2010. Google Scholar
Digital Library
Index Terms
LEAN: simplifying concurrency bug reproduction via replay-supported execution reduction
Recommendations
LEAN: simplifying concurrency bug reproduction via replay-supported execution reduction
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsDebugging concurrent programs is known to be difficult due to scheduling non-determinism. The technique of multiprocessor deterministic replay substantially assists debugging by making the program execution reproducible. However, facing the huge replay ...
Enabling tracing Of long-running multithreaded programs via dynamic execution reduction
ISSTA '07: Proceedings of the 2007 international symposium on Software testing and analysisDebugging long running multithreaded programs is a very challenging problem when using tracing-based analyses. Since such programs are non-deterministic, reproducing the bug is non-trivial and generating and inspecting traces for long running programs ...
PRES: probabilistic replay with execution sketching on multiprocessors
SOSP '09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principlesBug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-...







Comments