skip to main content
research-article

ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution

Authors Info & Claims
Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software survive hidden concurrency bugs during production runs. To help developers, we need techniques that fix exposed concurrency bugs.

The state-of-the-art techniques on concurrency-bug fixing and survival only satisfy a subset of four important properties: compatibility, correctness, generality, and performance.We aim to develop a system that satisfies all of these four properties. To achieve this goal, we leverage two observations: (1) rolling back a single thread is sufficient to recover from most concurrency-bug failures; (2) reexecuting an idempotent region, which requires no memory-state checkpoint, is sufficient to recover from many concurrency-bug failures. Our system ConAir includes a static analysis component that automatically identifies potential failure sites, a static analysis component that automatically identifies the idempotent code regions around every failure site, and a code-transformation component that inserts rollback-recovery code around the identified idempotent regions.

We evaluated ConAir on 10 real-world concurrency bugs in widely used C/C++ open-source applications. These bugs cover different types of failure symptoms and root causes. Quantitatively, ConAir helps software survive failures caused by all of these bugs with negligible run-time overhead (<1%) and short recovery time. Qualitatively, ConAir can help recover from failures caused by unknown bugs. It guarantees that program semantics remain unchanged and requires no change to operating systems or hardware.

References

  1. G. Altekar and I. Stoica. ODR: output-deterministic replay for multicore debugging. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot - a technique for cheap recovery. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Chew and D. Lie. Kivati: Fast detection and prevention of atomicity violations. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Chugh, J. W. Voung, R. Jhala, and S. Lerner. Dataflow analysis for concurrent programs using datarace detection. In PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang. Efficient deterministic multithreading through schedule relaxation. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. Trans. Program. Lang. Syst., 13 (4), Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. de Kruijf and K. Sankaralingam. Idempotent processor architecture. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. de Kruijf and K. Sankaralingam. Idempotent code generation: Implementation, analysis, and evaluation. In CGO, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: an architectural framework for software recovery of hardware faults. In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. de Kruijf, K. Sankaralingam, and S. Jha. Static analysis and compiler design for idempotent processing. In PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. h. Eom and B. Demsky. Self-stabilizing java. In PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Erickson, M. Musuvathi, S. Burckhardt, and K. Olynyk. Effective data-race detection for the kernel. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Ernst, A. Czeisler, W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Feng, S. Gupta, A. Ansari, S. A. Mahlke, and D. I. August. Encore: low-cost, fine-grained transient fault recovery. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Flanagan and S. N. Freund. Fasttrack: efficient and precise dynamic race detection. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Q. Gao, W. Zhang, Z. Chen, M. Zheng, and F. Qin. 2ndStrike: toward manifesting hidden concurrency typestate bugs. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Godefroid and N. Nagappani. Concurrency at Microsoft -- an exploratory survey. Technical report, MSR-TR-2008--75, Microsoft Research, May 2008.Google ScholarGoogle Scholar
  21. 006)}krste.ics06M. Hampton and K. Asanović. Implementing virtual memory in a vector processor with software restart markers. In ICS, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multiprocessor replay. Commun. ACM, 52 (6), June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Jin, L. Song, W. Zhang, S. Lu, and B. Liblit. Automated atomicity-violation fixing. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Jula, D. Tralamazza, C. Zamfir, and G. Candea. Deadlock immunity: Enabling systems to defend against deadlocks. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. W. Kim, C.-L. Ooi, R. Eigenmann, B. Falsafi, and T. N. Vijaykumar. Exploiting reference idempotency to reduce speculative storage overflow. ACM Trans. Program. Lang. Syst., 28 (5), Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. T. King, G. W. Dunlap, and P. M. Chen. Operating systems with time-traveling virtual machines. In Usenix, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. a}atomrace.padtad08Z. Letko, T. Vojnar, and B. Krena. AtomRace: data race and atomicity violation detector and healer. In PADTAD, 2008.Google ScholarGoogle Scholar
  30. N. G. Leveson and C. S. Turner. An investigation of the therac-25 accidents. Computer, 26 (7): 18--41, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Li, L. Tan, X. Wang, Y. Zhou, and C. Zhai. An empirical study of bug characteristics in modern open source software. In ASID, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Liu, C. Curtsinger, and E. D. Berger. Dthreads: efficient deterministic multithreading. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: detecting atomicity violations via access interleaving invariants. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. MUVI: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In SOSP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes -- a comprehensive study of real world concurrency bug characteristics. In ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Lucia and L. Ceze. Finding concurrency bugs with context-aware communication graphs. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Lucia, J. Devietti, K. Strauss, and L. Ceze. Atom-aid: Detecting and surviving atomicity violations. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W.-M. W. Hwu, B. R. Rau, and M. S. Schlansker. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Trans. Comput. Syst., 11 (4), Nov. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Misailovic, D. Kim, and M. Rinard. Parallelizing sequential programs with statistical accuracy tests. MIT-CSAIL-TR-2010-038.Google ScholarGoogle Scholar
  40. MySQL. Mysql 5.6 reference manual. http://dev.mysql.com/doc/refman/5.6/en/.Google ScholarGoogle Scholar
  41. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. }nasdaqPCWorld. Nasdaq's Facebook Glitch Came From Race Conditions. http://www.pcworld.com/businesscenter/article/255911/ nasdaqs_facebook_glitch_came_from_race_conditions.html.Google ScholarGoogle Scholar
  43. S. Qi, N. Otsuki, L. O. Nogueira, A. Muzahid, and J. Torrellas. Pacman: Tolerating asymmetric data races with unintrusive hardware. In HPCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: Treating bugs as allergies c a safe method to survive software failures. In SOSP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. K. Rajamani, G. Ramalingam, V. P. Ranganath, and K. Vaswani. Isolator: dynamically ensuring isolation in comcurrent programs. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. G. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In PPOPP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. TOCS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. SecurityFocus. Software bug contributed to blackout. http://www.securityfocus.com/news/8016.Google ScholarGoogle Scholar
  49. Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I use the wrong definition?: DefUse: definition-use invariants for detecting concurrency and sequential bugs. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Sidiroglou, O. Laadan, C. Perez, N. Viennot, J. Nieh, and A. D. Keromytis. Assure: automatic software self-healing using rescue points. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. G. Upadhyaya, S. P. Midkiff, and V. S. Pai. Automatic atomic region identification in shared memory SPMD programs. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. K. Veeraraghavan, P. M. Chen, J. Flinn, and S. Narayanasamy. Detecting and surviving data races using complementary schedules. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. H. Volos, A. J. Tack, M. M. Swift, and S. Lu. Applying transactional memory to concurrency bugs. In ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Weeratunge, X. Zhang, and S. Jagannathan. Accentuating the positive: atomicity inference and enforcement using correct executions. In OOPSLA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. N. Bairavasundaram. How do fixes become bugs? In FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Yu and S. Narayanasamy. Tolerating concurrency bugs using transactions as lifeguards. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Y. Yu, T. Rodeheffer, and W. Chen. RaceTrack: Efficient detection of data race conditions via adaptive tracking. In SOSP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. W. Zhang, C. Sun, and S. Lu. ConMem: Detecting severe concurrency bugs through an effect-oriented approach. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. W. Zhang, J. Lim, R. Olichandran, J. Scherpelz, G. Jin, S. Lu, and T. Reps. ConSeq: Detecting concurrency bugs through sequential errors. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 4
            ASPLOS '13
            April 2013
            540 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2499368
            Issue’s Table of Contents
            • cover image ACM Conferences
              ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
              March 2013
              574 pages
              ISBN:9781450318709
              DOI:10.1145/2451116

            Copyright © 2013 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 March 2013

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!