skip to main content
article

Stabilizers: a modular checkpointing abstraction for concurrent functional programs

Published:16 September 2006Publication History
Skip Abstract Section

Abstract

Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multi-threaded code is not obvious, however. For a thread to correctly rexecute a region of code, it must ensure that all other threads which have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior may result. however, automatically determining what constitutes a consistent global checkpoint is not straightforward since thread interactions are a dynamic property of the program.In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) that can be used to recover from transient faults. We introduce a new linguistic abstraction called stabilizers that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. Safe global states are computed through lightweight monitoring of communication events among threads (e.g. message-passing operations or updates to shared variables).Our experimental results on several realistic, multithreaded, server-style CML applications, including a web server and a windowing toolkit, show that the overheads to use stabilizers are small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints in concurrent functional programs.

References

  1. A. Adya, R. Gruber, B. Liskov, and U. Maheshwari. Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks. SIGMOD Record (ACM Special Interest Group on Management of Data), 24(2):23--34, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Saurabh Agarwal, Rahul Garg, Meeta S. Gupta, and Jose E. Moreira. Adaptive Incremental Checkpointing for Massively Parallel Systems. In ICS '04: Proceedings of the 18th annual international conference on Supercomputing, pages 277--286, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Micah Beck, James S. Plank, and Gerry Kingsley. Compiler-Assisted Checkpointing. Technical report, University of Tennessee, Knoxville, TN, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul Stodghill. Automated Application-Level Checkpointing of MPI Programs. In PPoPP '03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 84--94, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Greg Bronevetsky, Daniel Marques, Keshav Pingali, Peter Szwed, and Martin Schulz. Application-Level Checkpointing for Shared Memory Programs. In ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, pages 235--247, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Bruni, H. Melgratti, and U. Montanari. Theoretical Foundations for Compensations in Flow Composition Languages. In POPL '05: Proceedings of the 32nd ACM SIGPLAN-SIGACT sysposium on Principles of programming languages, pages 209--220, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot - A Technique for Cheap Recovery. In 6th Symposium on Operating Systems Design and Implementation, San Francisco, California, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yuqun Chen, James S. Plank, and Kai Li. CLIP: A Checkpointing Tool for Message-Passing Parallel Programs. In Supercomputing '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing, pages 1--11, New York, NY, USA, 1997. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jan Christiansen and Frank Huch. Searching for Deadlocks while Debugging Concurrent Haskell Programs. In ICFP '04: Proceedings of the ninth ACM SIGPLAN international conference on Functional programming, pages 28--39, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Panos K. Chrysanthis and Krithi Ramamritham. ACTA: the SAGA continues. In Database Transaction Models for Advanced Applications, pages 349--397. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. William R. Dieter and James E. Lumpp Jr. A User-level Checkpointing Library for POSIX Threads Programs. In FTCS '99: Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, page 224, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kevin Donnelly and Matthew Fluet. Transactional events. In ICFP '06: Proceedings of the Eleventh ACM SIGPLAN International Conference on Functional Programming, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E.N. (Mootaz) Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Comput. Surv., 34(3):375--408, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. John Field and Carlos A. Varela. Transactors: a Programming Model for Maintaining Globally Consistent Distributed State in Unreliable Environments. In POPL '05: Proceedings of the 32nd ACM SIGPLAN-SIGACT sysposium on Principles of programming languages, pages 195--208, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Matthew Flatt and Robert Bruce Findler. Kill-safe Synchronization Abstractions. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 47--58, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jim Gray and Andreas Reuter. Transaction Processing. Morgan-Kaufmann, 1993.Google ScholarGoogle Scholar
  17. Tim Harris and Keir Fraser. Language support for lightweight transactions. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 388--402. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy. Composable Memory Transactions. In ACM Conference on Principles and Practice of Parallel Programming, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer, III. Software transactional memory for dynamic-sized data structures. In ACM Conference on Principles of Distributed Computing, pages 92--101, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. http://www.mlton.org.Google ScholarGoogle Scholar
  21. D. Hulse. On Page-Based Optimistic Process Checkpointing. In IWOOOS '95: Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems, page 24, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mangesh Kasbekar and Chita Das. Selective Checkpointing and Rollback in Multithreaded Distributed Systems. In 21st International Conference on Distributed Computing Systems, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H.T. Kung and John T. Robinson. On Optimistic Methods for Concurrency Control. TODS, 6(2):213--226, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kai Li, Jeffrey Naughton, and James Plank. Real-time Concurrent Checkpoint for Parallel Programs. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 79--88, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John Reppy. Concurrent Programming in ML. Cambridge University Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Rinard. Effective Fine-Grained Synchronization for Automatically Parallelized Programs Using Optimistic Synchronization Primitives. ACM Transactions on Computer Systems, 17(4):337--371, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michael F. Ringenburg and Dan Grossman. Atomcaml: first-class atomicity via rollback. In ICFP '05: Proceedings of the Tenth ACM SIGPLAN International Conference on Functional Programming, pages 92--104, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Asser N. Tantawi and Manfred Ruschitzka. Performance Analysis of Checkpointing Strategies. ACM Trans. Comput. Syst., 2(2):123--144, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andrew P. Tolmach and Andrew W. Appel. Debugging Standard ML Without Reverse Engineering. In LFP '90: Proceedings of the 1990 ACM conference on LISP and functional programming, pages 1--12, New York, NY, USA, 1990. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Andrew P. Tolmach and Andrew W. Appel. Debuggable Concurrency Extensions for Standard ML. In PADD '91: Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging, pages 120--131, New York, NY, USA, 1991. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Adam Welc, Suresh Jagannathan, and Antony Hosking. Safe futures for java. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 439--453. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Adam Welc, Suresh Jagannathan, and Antony L. Hosking. Transactional Monitors for Concurrent Objects. In European Conference on Object-Oriented Programming, pages 519--542, 2004.Google ScholarGoogle Scholar

Index Terms

  1. Stabilizers: a modular checkpointing abstraction for concurrent functional programs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 41, Issue 9
              Proceedings of the 2006 ICFP conference
              September 2006
              296 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/1160074
              Issue’s Table of Contents
              • cover image ACM Conferences
                ICFP '06: Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
                September 2006
                308 pages
                ISBN:1595933093
                DOI:10.1145/1159803
                • General Chair:
                • John Reppy,
                • Program Chair:
                • Julia Lawall

              Copyright © 2006 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 16 September 2006

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!