Abstract
Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where there are multiple schedulers, one for each core. In this paper, we propose a reproduction technique for concurrent programs that execute on multi-core platforms. Our technique performs a lightweight analysis of a failing execution that occurs in a multi-core environment, and uses the result of the analysis to enable reproduction of the bug in a single-core system, under the control of a deterministic scheduler.
More specifically, our approach automatically identifies the execution point in the re-execution that corresponds to the failure point. It does so by analyzing the failure core dump and leveraging a technique called execution indexing that identifies a related point in the re-execution. By generating a core dump at this point, and comparing the differences betwen the two dumps, we are able to guide a search algorithm to efficiently generate a failure inducing schedule. Our experiments show that our technique is highly effective and has reasonable overhead.
- A. R. Alameldeen and D. A. Wood. Addressing Workload Variability in Architectural Simulations. In IEEE Micro, 23(6):94--98, 2003. Google Scholar
Digital Library
- G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In SOSP, pages 193--206, 2009. Google Scholar
Digital Library
- A. Ayers, R. Schooler, C. Metcalf, A. Agarwal, J. Rhee, and E. Witchel. Traceback: First Fault Diagnosis by Reconstruction of Distributed Control Flow. In PLDI, pages 201--212, 2005. Google Scholar
Digital Library
- S. Bhansali, W.-K. Chen, S. de Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for Instruction-Level Tracing and Analysis of Program Executions. In VEE, pages 154--163, 2006. Google Scholar
Digital Library
- H. J. Boehm and M. Weiser. Garbage Collection in an Uncooperative Environment. In Software Practice and Experience, 18(9):807--820, 1988. Google Scholar
Digital Library
- M. D. Bond and K. S. McKinley. Probabilistic Calling Context. In OOPSLA, pages 97--112, 2007. Google Scholar
Digital Library
- J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multi-threaded Applications. In SIGMETRICS, pages 48--59, 1998. Google Scholar
Digital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In VEE, pages 121--130, 2008. Google Scholar
Digital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The Program Dependence Graph and its Use in Optimization. ACM Transactions on Programming Languages and Systems, 9(3):319--349, 1987. Google Scholar
Digital Library
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving Data Publishing: A Survey on Recent Developments. In ACM Computing Surveys, 2009.Google Scholar
- Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In OSDI, pages 193--208, 2008. Google Scholar
Digital Library
- D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, pages 265--276, 2008. Google Scholar
Digital Library
- P. Joshi, C. S. Park, K. Sen, and M. Naik. A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks. In PLDI, pages 110--120, 2009. Google Scholar
Digital Library
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging Operating Systems with Time-Traveling Virtual Machines. In USENIX, pages 1--15, 2005. Google Scholar
Digital Library
- B. Korel and J. Laski. Dynamic Program Slicing. In Information Processing Letters, 29(3):155--163, 1988. Google Scholar
Digital Library
- P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay. In ASPLOS, pages 73--84, 2009. Google Scholar
Digital Library
- M. Musuvathi and S. Qadeer. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In PLDI, pages 446--455, 2007. Google Scholar
Digital Library
- S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, pages 229--240, 2006. Google Scholar
Digital Library
- N. Nethercote and J. Seward. Valgrind: A Framework for Heavy-weight Dynamic Binary Instrumentation. In PLDI, pages 89--100, 2007. Google Scholar
Digital Library
- R. H. B. Netzer and M. H. Weaver. Optimal Tracing and Incremental Reexecution for Debugging Long-Running Programs. In PLDI, pages 313--325, 1994. Google Scholar
Digital Library
- D. Z. Pan and M. A. Linton. Supporting Reverse Execution for Parallel Programs. In SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, pages 124--129, 1988. Google Scholar
Digital Library
- S. Park, S. Lu, and Y. Zhou. Ctrigger: Exposing Atomicity Violation Bugs from Their Hiding Places. In ASPLOS, pages 25--36, 2009. Google Scholar
Digital Library
- S. Park, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Do You Have to Reproduce the Bug at the First Replay Attempt? -- pres: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, pages 177--192, 2009. Google Scholar
Digital Library
- M. Ronsse, K. D. Bosschere, M. Christiaens, J. C. d. Kergommeaux, and D. Kranzlmüller. Record/Replay for Nondeterministic Program Executions. In Communcation of the ACM, 46(9):62--67, 2003. Google Scholar
Digital Library
- Y. Saito. Jockey: A User-Space Library for Record-Replay Debugging. In Automated Analysis--Driven Debugging, pages 69--76, 2005. Google Scholar
Digital Library
- , S. Sarkar, P. Sewell, F.Z. Nardelli, S. Owens, T. Ridge, T. Braibant, M. Myreen, and J. Aglave The Semantics of x86-CC Multiprocessor Machine Code In POPL, pages 379--391, 2009. Google Scholar
Digital Library
- K. Sen. Race Directed Random Testing of Concurrent Programs. In PLDI, pages 11--21, 2008. Google Scholar
Digital Library
- S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension For Rollback and Deterministic Replay for Software Debugging. In USENIX, pages 29--44, 2004. Google Scholar
Digital Library
- B. Xin, N. Sumner, and X. Zhang. Efficient Program Execution Indexing. In PLDI, pages 238--249, 2008. Google Scholar
Digital Library
- X. Zhang, R. Gupta, and Y. Zhang. Cost and Precision Tradeoffs of Dynamic Data Slicing Algorithms. ACM Transactions on Programming Languages and Systems, 27(4):631--661, 2005. Google Scholar
Digital Library
Index Terms
Analyzing multicore dumps to facilitate concurrency bug reproduction
Recommendations
Analyzing multicore dumps to facilitate concurrency bug reproduction
ASPLOS '10Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is ...
Analyzing multicore dumps to facilitate concurrency bug reproduction
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsDebugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is ...
Analyzing concurrency bugs using dual slicing
ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysisRecently, there has been much interest in developing analyzes to detect concurrency bugs that arise because of data races, atomicity violations, execution omission, etc. However, determining whether reported bugs are in fact real, and understanding how ...







Comments