skip to main content
research-article

Cooperative empirical failure avoidance for multithreaded programs

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Concurrency errors in multithreaded programs are difficult to find and fix. We propose Aviso, a system for avoiding schedule-dependent failures. Aviso monitors events during a program's execution and, when a failure occurs, records a history of events from the failing execution. It uses this history to generate schedule constraints that perturb the order of events in the execution and thereby avoids schedules that lead to failures in future program executions. Aviso leverages scenarios where many instances of the same software run, using a statistical model of program behavior and experimentation to determine which constraints most effectively avoid failures. After implementing Aviso, we showed that it decreased failure rates for a variety of important desktop, server, and cloud applications by orders of magnitude, with an average overhead of less than 20% and, in some cases, as low as 5%.

References

  1. Issue 127: incr/decr operations are not thread safe. http://code.google.com/p/memcached/issues/detail?id=127.Google ScholarGoogle Scholar
  2. T. Bergan et al. Coredet: a compiler and runtime system for deterministic multithreaded execution. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Burckhardt, P. Kothari, M. Musuvathi, and S. Nagarakatte. A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Jin et al. Automatic atomicity-violation fixing. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and sampling strategies for Cooperative Concurrency Bug Isolation. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Jula, P. Tozun, and G. Candea. Communix: A collaborative deadlock immunity framework. In DSN, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Jula, D. M. Tralamazza, C. Zamfir, and G. Candea. Deadlock immunity: Enabling systems to defend against deadlocks. In OSDI, pages 295--308, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Kasikci, C. Zamfir, and G. Candea. Data races vs. data race bugs: telling the difference with portend. In ASPLOS '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics. In ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Lucia, L. Ceze, and K. Strauss. ColorSafe: Architectural Support for Debugging and Dynamically Avoiding Multi-Variable Atomicity Violations. In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Lucia, J. Devietti, K. Strauss, and L. Ceze. Atom-Aid: Detecting and Surviving Atomicity Violations. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C.-K. Luk et al. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In PLDI '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. H. Perkins et al. Automatically patching errors in deployed software. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: treating bugs as allergies--a safe method to survive software failures. In SOSP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rajamani et al. Isolator: Dynamically ensuring isolation in concurrent programs. In MICRO, 2009.Google ScholarGoogle Scholar
  22. P. Ratanaworabhan et al. Detecting and tolerating asymmetric races. In IEEE Transactions on Computers, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Rinard et al. Enhancing server availability and security through failure-oblivious computing. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Shi et al. Do I use the wrong definition? defuse: definition-use invariants for detecting concurrency and sequential bugs. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Volos, A. J. Tack, M. M. Swift, and S. Lu. Applying transactional memory to concurrency bugs. In ASPLOS 2012, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Wu, H. Cui, and J. Yang. Bypassing races in live applications with execution filters. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Yu and S. Narayanasamy. A Case for an Interleaving Constrained Shared-Memory Multi-Processor. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Yu and S. Narayanasamy. Tolerating concurrency bugs using transactions as lifeguards. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cooperative empirical failure avoidance for multithreaded programs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 48, Issue 4
          ASPLOS '13
          April 2013
          540 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2499368
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
            March 2013
            574 pages
            ISBN:9781450318709
            DOI:10.1145/2451116

          Copyright © 2013 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 March 2013

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!