Abstract
Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software survive hidden concurrency bugs during production runs. To help developers, we need techniques that fix exposed concurrency bugs.
The state-of-the-art techniques on concurrency-bug fixing and survival only satisfy a subset of four important properties: compatibility, correctness, generality, and performance.We aim to develop a system that satisfies all of these four properties. To achieve this goal, we leverage two observations: (1) rolling back a single thread is sufficient to recover from most concurrency-bug failures; (2) reexecuting an idempotent region, which requires no memory-state checkpoint, is sufficient to recover from many concurrency-bug failures. Our system ConAir includes a static analysis component that automatically identifies potential failure sites, a static analysis component that automatically identifies the idempotent code regions around every failure site, and a code-transformation component that inserts rollback-recovery code around the identified idempotent regions.
We evaluated ConAir on 10 real-world concurrency bugs in widely used C/C++ open-source applications. These bugs cover different types of failure symptoms and root causes. Quantitatively, ConAir helps software survive failures caused by all of these bugs with negligible run-time overhead (<1%) and short recovery time. Qualitatively, ConAir can help recover from failures caused by unknown bugs. It guarantees that program semantics remain unchanged and requires no change to operating systems or hardware.
- G. Altekar and I. Stoica. ODR: output-deterministic replay for multicore debugging. In SOSP, 2009. Google Scholar
Digital Library
- A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In OSDI, 2010. Google Scholar
Digital Library
- T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In OSDI, 2010. Google Scholar
Digital Library
- G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot - a technique for cheap recovery. In OSDI, 2004. Google Scholar
Digital Library
- L. Chew and D. Lie. Kivati: Fast detection and prevention of atomicity violations. In EuroSys, 2010. Google Scholar
Digital Library
- R. Chugh, J. W. Voung, R. Jhala, and S. Lerner. Dataflow analysis for concurrent programs using datarace detection. In PLDI, 2008. Google Scholar
Digital Library
- H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang. Efficient deterministic multithreading through schedule relaxation. In SOSP, 2011. Google Scholar
Digital Library
- R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. Trans. Program. Lang. Syst., 13 (4), Oct. 1991. Google Scholar
Digital Library
- M. de Kruijf and K. Sankaralingam. Idempotent processor architecture. In MICRO, 2011. Google Scholar
Digital Library
- M. de Kruijf and K. Sankaralingam. Idempotent code generation: Implementation, analysis, and evaluation. In CGO, 2013.Google Scholar
Digital Library
- M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: an architectural framework for software recovery of hardware faults. In ISCA, 2010. Google Scholar
Digital Library
- M. de Kruijf, K. Sankaralingam, and S. Jha. Static analysis and compiler design for idempotent processing. In PLDI, 2012. Google Scholar
Digital Library
- Y. h. Eom and B. Demsky. Self-stabilizing java. In PLDI, 2012. Google Scholar
Digital Library
- J. Erickson, M. Musuvathi, S. Burckhardt, and K. Olynyk. Effective data-race detection for the kernel. In OSDI, 2010. Google Scholar
Digital Library
- M. Ernst, A. Czeisler, W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE, 2000. Google Scholar
Digital Library
- S. Feng, S. Gupta, A. Ansari, S. A. Mahlke, and D. I. August. Encore: low-cost, fine-grained transient fault recovery. In MICRO, 2011. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. Fasttrack: efficient and precise dynamic race detection. In PLDI, 2009. Google Scholar
Digital Library
- Q. Gao, W. Zhang, Z. Chen, M. Zheng, and F. Qin. 2ndStrike: toward manifesting hidden concurrency typestate bugs. In ASPLOS, 2011. Google Scholar
Digital Library
- P. Godefroid and N. Nagappani. Concurrency at Microsoft -- an exploratory survey. Technical report, MSR-TR-2008--75, Microsoft Research, May 2008.Google Scholar
- 006)}krste.ics06M. Hampton and K. Asanović. Implementing virtual memory in a vector processor with software restart markers. In ICS, 2006.Google Scholar
Digital Library
- D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multiprocessor replay. Commun. ACM, 52 (6), June 2009. Google Scholar
Digital Library
- G. Jin, L. Song, W. Zhang, S. Lu, and B. Liblit. Automated atomicity-violation fixing. In PLDI, 2011. Google Scholar
Digital Library
- H. Jula, D. Tralamazza, C. Zamfir, and G. Candea. Deadlock immunity: Enabling systems to defend against deadlocks. In OSDI, 2008. Google Scholar
Digital Library
- S. W. Kim, C.-L. Ooi, R. Eigenmann, B. Falsafi, and T. N. Vijaykumar. Exploiting reference idempotency to reduce speculative storage overflow. ACM Trans. Program. Lang. Syst., 28 (5), Sept. 2006. Google Scholar
Digital Library
- S. T. King, G. W. Dunlap, and P. M. Chen. Operating systems with time-traveling virtual machines. In Usenix, 2005. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, 2004. Google Scholar
Digital Library
- D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In ASPLOS, 2010. Google Scholar
Digital Library
- a}atomrace.padtad08Z. Letko, T. Vojnar, and B. Krena. AtomRace: data race and atomicity violation detector and healer. In PADTAD, 2008.Google Scholar
- N. G. Leveson and C. S. Turner. An investigation of the therac-25 accidents. Computer, 26 (7): 18--41, 1993. Google Scholar
Digital Library
- Z. Li, L. Tan, X. Wang, Y. Zhou, and C. Zhai. An empirical study of bug characteristics in modern open source software. In ASID, 2006. Google Scholar
Digital Library
- T. Liu, C. Curtsinger, and E. D. Berger. Dthreads: efficient deterministic multithreading. In SOSP, 2011. Google Scholar
Digital Library
- S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: detecting atomicity violations via access interleaving invariants. In ASPLOS, 2006. Google Scholar
Digital Library
- S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. MUVI: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In SOSP, 2007. Google Scholar
Digital Library
- S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes -- a comprehensive study of real world concurrency bug characteristics. In ASPLOS, 2008. Google Scholar
Digital Library
- B. Lucia and L. Ceze. Finding concurrency bugs with context-aware communication graphs. In MICRO, 2009. Google Scholar
Digital Library
- B. Lucia, J. Devietti, K. Strauss, and L. Ceze. Atom-aid: Detecting and surviving atomicity violations. In ISCA, 2008. Google Scholar
Digital Library
- S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W.-M. W. Hwu, B. R. Rau, and M. S. Schlansker. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Trans. Comput. Syst., 11 (4), Nov. 1993. Google Scholar
Digital Library
- S. Misailovic, D. Kim, and M. Rinard. Parallelizing sequential programs with statistical accuracy tests. MIT-CSAIL-TR-2010-038.Google Scholar
- MySQL. Mysql 5.6 reference manual. http://dev.mysql.com/doc/refman/5.6/en/.Google Scholar
- M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In ASPLOS, 2009. Google Scholar
Digital Library
- }nasdaqPCWorld. Nasdaq's Facebook Glitch Came From Race Conditions. http://www.pcworld.com/businesscenter/article/255911/ nasdaqs_facebook_glitch_came_from_race_conditions.html.Google Scholar
- S. Qi, N. Otsuki, L. O. Nogueira, A. Muzahid, and J. Torrellas. Pacman: Tolerating asymmetric data races with unintrusive hardware. In HPCA, 2012. Google Scholar
Digital Library
- F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: Treating bugs as allergies c a safe method to survive software failures. In SOSP, 2005. Google Scholar
Digital Library
- S. K. Rajamani, G. Ramalingam, V. P. Ranganath, and K. Vaswani. Isolator: dynamically ensuring isolation in comcurrent programs. In ASPLOS, 2009. Google Scholar
Digital Library
- P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. G. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In PPOPP, 2009. Google Scholar
Digital Library
- S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. TOCS, 1997. Google Scholar
Digital Library
- SecurityFocus. Software bug contributed to blackout. http://www.securityfocus.com/news/8016.Google Scholar
- Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I use the wrong definition?: DefUse: definition-use invariants for detecting concurrency and sequential bugs. In OOPSLA, 2010. Google Scholar
Digital Library
- S. Sidiroglou, O. Laadan, C. Perez, N. Viennot, J. Nieh, and A. D. Keromytis. Assure: automatic software self-healing using rescue points. In ASPLOS, 2009. Google Scholar
Digital Library
- G. Upadhyaya, S. P. Midkiff, and V. S. Pai. Automatic atomic region identification in shared memory SPMD programs. In OOPSLA, 2010. Google Scholar
Digital Library
- M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006. Google Scholar
Digital Library
- K. Veeraraghavan, P. M. Chen, J. Flinn, and S. Narayanasamy. Detecting and surviving data races using complementary schedules. In SOSP, 2011. Google Scholar
Digital Library
- H. Volos, A. J. Tack, M. M. Swift, and S. Lu. Applying transactional memory to concurrency bugs. In ASPLOS, 2012. Google Scholar
Digital Library
- D. Weeratunge, X. Zhang, and S. Jagannathan. Accentuating the positive: atomicity inference and enforcement using correct executions. In OOPSLA, 2011. Google Scholar
Digital Library
- Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. N. Bairavasundaram. How do fixes become bugs? In FSE, 2011. Google Scholar
Digital Library
- J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009. Google Scholar
Digital Library
- J. Yu and S. Narayanasamy. Tolerating concurrency bugs using transactions as lifeguards. In MICRO, 2010. Google Scholar
Digital Library
- Y. Yu, T. Rodeheffer, and W. Chen. RaceTrack: Efficient detection of data race conditions via adaptive tracking. In SOSP, 2005. Google Scholar
Digital Library
- W. Zhang, C. Sun, and S. Lu. ConMem: Detecting severe concurrency bugs through an effect-oriented approach. In ASPLOS, 2010. Google Scholar
Digital Library
- W. Zhang, J. Lim, R. Olichandran, J. Scherpelz, G. Jin, S. Lu, and T. Reps. ConSeq: Detecting concurrency bugs through sequential errors. In ASPLOS, 2011. Google Scholar
Digital Library
Index Terms
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution
Recommendations
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsMany concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software ...
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution
ASPLOS '13Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software ...
Automatically detecting and fixing concurrency bugs in go software systems
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsGo is a statically typed programming language designed for efficient and reliable concurrent programming. For this purpose, Go provides lightweight goroutines and recommends passing messages using channels as a less error-prone means of thread ...







Comments