Abstract
In this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These fragments, called reconstructions, are inferred from inter-thread communication surrounding the root cause of a bug and significantly simplify the process of understanding bugs.
The key idea in Recon is to monitor executions and build graphs that encode inter-thread communication with enough context information to build reconstructions. Recon leverages reconstructions built from multiple application executions and uses machine learning to identify which ones illustrate the root cause of a bug. Recon's approach is general because it does not rely on heuristics specific to any type of bug, application, or programming model. Therefore, it is able to deal with single- and multiple-variable concurrency bugs regardless of their type (e.g., atomicity violation, ordering, etc). To make graph collection efficient, Recon employs selective monitoring and allows metadata information to be imprecise without compromising accuracy. With these optimizations, Recon's graph collection imposes overheads typically between 5x and 20x for both C/C++ and Java programs, with overheads as low as 13% in our experiments. We evaluate Recon with buggy applications, and show it produces reconstructions that include all code points involved in bugs' causes, and presents them in an accurate order. We include a case study of understanding and fixing a previously unresolved bug to showcase Recon's effectiveness.
- Z. Anderson, D. Gay, R. Ennals, and E. Brewer. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, 2008. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008. Google Scholar
Digital Library
- S. M. Blackburn et al. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA, 2006. Google Scholar
Digital Library
- H.-J. Boehm and S. V. Adve. Foundations of the C Concurrency Memory Model. In PLDI, 2008. Google Scholar
Digital Library
- S. Burckhardt, P. Kothari, M. Musuvathi, and S. Nagarakatte. A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs. In ASPLOS, 2010. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. FastTrack: Efficient and Precise Dynamic Race Detection. In PLDI, 2009. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. The RoadRunner Dynamic Analysis Framework for Concurrent Programs. In PASTE, 2010. Google Scholar
Digital Library
- C. Flanagan, S. N. Freund, and J. Yi. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, 2008. Google Scholar
Digital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 2009. Google Scholar
Digital Library
- C. Hammer, J. Dolby, M. Vaziri, and F. Tip. Dynamic Detection of Atomic-Set-Serializability Violations. In ICSE, 2008. Google Scholar
Digital Library
- G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation. In OOPSLA, 2010. Google Scholar
Digital Library
- P. Joshi and K. Sen. Predictive Typestate Checking of Multithreaded Java Programs. In ASE, 2008. Google Scholar
Digital Library
- K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In OOPSLA, 2002. Google Scholar
Digital Library
- I. Kononenko. Estimating Attributes: Analysis and Extensions of RELIEF. In European Conference on Machine Learning, 1994. Google Scholar
Digital Library
- B. Liblit. Cooperative Bug Isolation, volume 4440 of Lecture Notes in Computer Science. Springer, 2007.Google Scholar
- S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics. In ASPLOS, 2008. Google Scholar
Digital Library
- S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. In ASPLOS, 2006. Google Scholar
Digital Library
- B. Lucia and L. Ceze. Finding Concurrency Bugs with Context-Aware Communication Graphs. In MICRO, 2009. Google Scholar
Digital Library
- C.-K. Luk et al. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In POPL, 2005. Google Scholar
Digital Library
- M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and I. Neamtiu. Finding and Reproducing Heisenbugs in Concurrent Programs. In OSDI, 2008. Google Scholar
Digital Library
- S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places. In ASPLOS, 2009. Google Scholar
Digital Library
- S. Park, R. W. Vuduc, and M. J. Harrold. Falcon: Fault Localization in Concurrent Programs. In ICSE, 2010. Google Scholar
Digital Library
- S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, 2009. Google Scholar
Digital Library
- M. Ronsee and K. D. Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ToCS, 1999. Google Scholar
Digital Library
- S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs. ToCS, 1997. Google Scholar
Digital Library
- Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I Use the Wrong Definition?: DeFuse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. In OOPSLA, 2010. Google Scholar
Digital Library
- L. A. Smith, J. M. Bull, and J. Obdrzálek. A parallel Java Grande benchmark suite. In Supercomputing, 2001. Google Scholar
Digital Library
- B. P. Wood, A. Sampson, L. Ceze, and D. Grossman. Composable Specifications for Structured Shared-Memory Communication. In OOPSLA, 2010. Google Scholar
Digital Library
- M. Xu, R. Bodík, and M. D. Hill. A Serializability Violation Detector for Shared-Memory Server Programs. In PLDI, June 2005. Google Scholar
Digital Library
- M. Xu, M. D. Hill, and R. Bodik. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In ASPLOS, 2006. Google Scholar
Digital Library
- J. Yu and S. Narayanasamy. A Case for an Interleaving Constrained Shared-Memory Multi-Processor. In ISCA, 2009. Google Scholar
Digital Library
- P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-Assisted Lockset-based Race Detection. In HPCA, 2007. Google Scholar
Digital Library
Index Terms
Isolating and understanding concurrency errors using reconstructed execution fragments
Recommendations
Isolating and understanding concurrency errors using reconstructed execution fragments
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationIn this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These ...
Instrumentation and sampling strategies for cooperative concurrency bug isolation
OOPSLA '10Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under ...
Instrumentation and sampling strategies for cooperative concurrency bug isolation
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsFixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under ...







Comments