skip to main content
article
Public Access

PSync: a partially synchronous language for fault-tolerant distributed algorithms

Published:11 January 2016Publication History
Skip Abstract Section

Abstract

Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. We introduce PSync, a domain specific language based on the Heard-Of model, which views asynchronous faulty systems as synchronous ones with an adversarial environment that simulates asynchrony and faults by dropping messages. We define a runtime system for PSync that efficiently executes on asynchronous networks. We formalise the relation between the runtime system and PSync in terms of observational refinement. The high-level lockstep abstraction introduced by PSync simplifies the design and implementation of fault-tolerant distributed algorithms and enables automated formal verification. We have implemented an embedding of PSync in the Scala programming language with a runtime system for partially synchronous networks. We show the applicability of PSync by implementing several important fault-tolerant distributed algorithms and we compare the implementation of consensus algorithms in PSync against implementations in other languages in terms of code size, runtime efficiency, and verification.

References

  1. P. A. Abdulla, A. Collomb-Annichini, A. Bouajjani, and B. Jonsson. Using Forward Reachability Analysis for Verification of Lossy Channel Systems. FMSD, 25(1):39–65, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Afek and E. Gafni. Asynchrony from synchrony. In D. Frey, M. Raynal, S. Sarkar, R. K. Shyamasundar, and P. Sinha, editors, Distributed Computing and Networking, 14th International Conference, ICDCN 2013, Mumbai, India, January 3-6, 2013. Proceedings, pages 225–239, 2013.Google ScholarGoogle Scholar
  3. S. F. Allen, R. L. Constable, R. Eaton, C. Kreitz, and L. Lorigo. The nuprl open logical environment. In D. A. McAllester, editor, Automated Deduction - CADE-17, 17th International Conference on Automated Deduction, Pittsburgh, PA, USA, June 17-20, 2000, Proceedings, pages 170–176, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Alvaro, T. Condie, N. Conway, J. M. Hellerstein, and R. Sears. I do declare: consensus in a logic language. Operating Systems Review, 43 (4):25–30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Alvaro, N. Conway, J. Hellerstein, and W. R. Marczak. Consistency analysis in bloom: a CALM and collected approach. In CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings, pages 249–260, 2011.Google ScholarGoogle Scholar
  6. Y. Amir and J. Kirsch. Paxos for system builders: An overview. In Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2008), Yorktown, NY, September 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. R. Apt and D. Kozen. Limits for automatic verification of finitestate concurrent systems. Inf. Process. Lett., 22(6):307–309, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. L. Armstrong. The development of erlang. In S. L. P. Jones, M. Tofte, and A. M. Berman, editors, Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming (ICFP ’97), Amsterdam, The Netherlands, June 9-11, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. , pages 196–203. ACM, 1997.Google ScholarGoogle Scholar
  10. M. P. Ashley-Rollman, P. Lee, S. C. Goldstein, P. Pillai, and J. Campbell. A language for large ensembles of independently executing nodes. In Logic Programming, 25th International Conference, ICLP 2009, Pasadena, CA, USA, July 14-17, 2009. Proceedings, pages 265– 280, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Basu, T. Bultan, and M. Ouederni. Synchronizability for verification of asynchronously communicating systems. In V. Kuncak and A. Rybalchenko, editors, Verification, Model Checking, and Abstract Interpretation - 13th International Conference, VMCAI 2012, Philadelphia, PA, USA, January 22-24, 2012. Proceedings, pages 56–71, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Ben-Or. Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In PODC, pages 27– 30. ACM, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Biely, B. Charron-Bost, A. Gaillard, M. Hutle, A. Schiper, and J. Widder. Tolerating corrupted communication. In PODC, pages 244–253, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Biely, Z. Milosevic, N. Santos, and A. Schiper. S-paxos: Offloading the leader for high throughput state machine replication. In IEEE 31st Symposium on Reliable Distributed Systems, SRDS 2012, Irvine, CA, USA, October 8-11, 2012, pages 111–120, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Biely, P. Delgado, Z. Milosevic, and A. Schiper. Distal: A framework for implementing fault-tolerant distributed algorithms. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, June 24-27, 2013, pages 1–8, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Brand and P. Zafiropulo. On Communicating Finite-State Machines. J. ACM, 30(2):323–342, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Burrows. The chubby lock service for loosely-coupled distributed systems. In OSDI, Berkeley, CA, USA, 2006. USENIX Association. ISBN 1-931971-47-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Cachin, R. Guerraoui, and L. Rodrigues. Reliable broadcast. In Introduction to Reliable and Secure Distributed Programming, pages 73–135. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-15259-7.Google ScholarGoogle Scholar
  19. M. Castro and B. Liskov. Practical byzantine fault tolerance. In M. I. Seltzer and P. J. Leach, editors, Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), New Orleans, Louisiana, USA, February 22-25, 1999, pages 173–186. USENIX Association, 1999.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225–267, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, PODC ’07, pages 398–407, New York, NY, USA, 2007. ACM. ISBN 978-1- 59593-616-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Chaouch-Saad, B. Charron-Bost, and S. Merz. A reduction theorem for the verification of round-based distributed algorithms. In RP, volume 5797 of LNCS, pages 93–106, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Charron-Bost and S. Merz. Formal verification of a consensus algorithm in the heard-of model. Int. J. Software and Informatics, 3(2-3):273–303, 2009.Google ScholarGoogle Scholar
  24. B. Charron-Bost and A. Schiper. The heard-of model: computing in distributed systems with benign faults. Distributed Computing, 22(1): 49–71, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Charron-Bost, H. Debrat, and S. Merz. Formal verification of consensus algorithms tolerating malicious faults. In SSS, pages 120– 134. Springer, 2011. ISBN 978-3-642-24549-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Chaudhuri. More choices allow more faults: Set consensus problems in totally asynchronous systems. Information and Computation, 105(1):132 – 158, 1993. ISSN 0890-5401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Clement, E. L. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making byzantine fault tolerant systems tolerate byzantine faults. In J. Rexford and E. G. Sirer, editors, Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, April 22-24, 2009, Boston, MA, USA, pages 153–168. USENIX Association, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Debrat and S. Merz. Verifying fault-tolerant distributed algorithms in the heard-of model. Archive of Formal Proofs, 2012.Google ScholarGoogle Scholar
  29. A. Desai, P. Garg, and P. Madhusudan. Natural proofs for asynchronous programs using almost-synchronous reductions. In A. P. Black and T. D. Millstein, editors, Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 709––725, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dialyzer. http://www.erlang.org/doc/man/dialyzer.html.Google ScholarGoogle Scholar
  31. D. Dolev and E. N. Hoch. Byzantine self-stabilizing pulse in a bounded-delay model. In T. Masuzawa and S. Tixeuil, editors, Stabilization, Safety, and Security of Distributed Systems, 9th International Symposium, SSS 2007, Paris, France, November 14-16, 2007, Proceedings, pages 234–252. Springer, 2007.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Dolev, C. Dwork, and L. Stockmeyer. On the minimal synchronism needed for distributed consensus. In Foundations of Computer Science, 1983, 24th Annual Symposium on, pages 393–402, 1983.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. D’Osualdo, J. Kochems, and C. L. Ong. Automatic verification of erlang-style concurrency. In Static Analysis - 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 454–476, 2013.Google ScholarGoogle Scholar
  34. C. Dragoi, T. A. Henzinger, H. Veith, J. Widder, and D. Zufferey. A logic-based framework for verifying consensus algorithms. In K. L. McMillan and X. Rival, editors, VMCAI, pages 161–181. Springer, 2014.Google ScholarGoogle Scholar
  35. C. Dwork, N. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. JACM, 35(2):288–323, Apr. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Elrad and N. Francez. Decomposition of distributed programs into communication-closed layers. Sci. Comput. Program., 2(3):155–173, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. M. Falerio, S. K. Rajamani, K. Rajan, G. Ramalingam, and K. Vaswani. Generalized lattice agreement. In ACM Symposium on Principles of Distributed Computing, PODC ’12, Funchal, Madeira, Portugal, July 16-18, 2012, pages 125–134, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. I. Filipovic, P. W. O’Hearn, N. Rinetzky, and H. Yang. Abstraction for concurrent objects. Theor. Comput. Sci., 411(51-52):4379–4398, 2010.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374– 382, Apr. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Fredlund and H. Svensson. Mcerlang: a model checker for a distributed functional programming language. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP 2007, Freiburg, Germany, October 1-3, 2007, pages 125– 136, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. Gafni. Round-by-round fault detectors: Unifying synchrony and asynchrony (extended abstract). In B. A. Coan and Y. Afek, editors, Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’98, Puerto Vallarta, Mexico, June 28 - July 2, 1998, pages 143–152, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Gray. Notes on data base operating systems. In R. Bayer, R. Graham, and G. Seegmüller, editors, Operating Systems, volume 60 of Lecture Notes in Computer Science, pages 393–481. Springer Berlin Heidelberg, 1978. ISBN 978-3-540-08755-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Guerraoui and L. Rodrigues. Introduction to Reliable Distributed Programming. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. ISBN 3540288457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Haller and M. Odersky. Scala actors: Unifying thread-based and event-based programming. Theor. Comput. Sci., 410(2-3):202–220, 2009.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. C. Hewitt, P. Bishop, and R. Steiger. A universal modular ACTOR formalism for artificial intelligence. In Proceedings of the 3rd International Joint Conference on Artificial Intelligence. Standford, CA, August 1973, pages 235–245, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666–677, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Hutle and A. Schiper. Communication predicates: A high-level abstraction for coping with transient and dynamic faults. In DSN, pages 92–101, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  49. F. P. Junqueira, B. C. Reed, and M. Serafini. Zab: High-performance broadcast for primary-backup systems. In Proceedings of the 2011 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2011, Hong Kong, China, June 27-30 2011, pages 245– 256. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: language support for building distributed systems. In SIGPLAN Conference on Programming Language Design and Implementation, pages 179–188, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. D. Kshemkalyani and M. Singhal. Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16 (2):133–169, May 1998. ISSN 0734-2071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. L. Lamport. Distributed algorithms in TLA (abstract). In G. Neiger, editor, Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, July 16-19, 2000, Portland, Oregon, USA. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. L. Lamport. The pluscal algorithm language. In Theoretical Aspects of Computing - ICTAC 2009, 6th International Colloquium, Kuala Lumpur, Malaysia, August 16-20, 2009. Proceedings, pages 36–60, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Y. A. Liu, S. D. Stoller, B. Lin, and M. Gorbovitski. From clarity to efficiency for distributed algorithms. In G. T. Leavens and M. B. Dwyer, editors, Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, pages 395–410, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. N. Lynch. Distributed Algorithms. Morgan Kaufman, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. N. A. Lynch and M. R. Tuttle. Hierarchical correctness proofs for distributed algorithms. In F. B. Schneider, editor, Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, Vancouver, British Columbia, Canada, August 10-12, 1987, pages 137–151, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. Mao, F. P. Junqueira, and K. Marzullo. Mencius: Building efficient replicated state machine for wans. In R. Draves and R. van Renesse, editors, 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings, pages 369–384. USENIX Association, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. O. Mari´c, C. Sprenger, and D. Basin. Consensus refined. In J. Karlsson and Y. Amir, editors, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. V. R. Mark Bickford, Robert L. Constable. Logic of events, a framework to reason about distributed systems. In 2012 Languages for Distributed Algorithms Workshop, Philadelphia, PA, 2012.Google ScholarGoogle Scholar
  61. H. Miller, P. Haller, E. Burmako, and M. Odersky. Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization. In A. L. Hosking, P. T. Eugster, and C. V. Lopes, editors, OOPSLA, pages 183–202, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, I. Inf. Comput., 100(1):1–40, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, II. Inf. Comput., 100(1):41–77, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. I. Moraru, D. G. Andersen, and M. Kaminsky. There is more consensus in egalitarian parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 358–372, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2388- 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. L. Moura and N. Bjorner. Z3: An efficient SMT solver. In C. Ramakrishnan and J. Rehof, editors, TACAS, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer Berlin Heidelberg, 2008. ISBN 978-3-540-78799-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. P. Parvedy, M. Raynal, and C. Travers. Early-stopping k-set agreement in synchronous systems prone to any number of process crashes. In V. Malyshkin, editor, Parallel Computing Technologies, volume 3606 of Lecture Notes in Computer Science, pages 49–58. Springer Berlin Heidelberg, 2005. ISBN 978-3-540-28126-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. N. Santoro and P. Widmayer. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci., 384(2-3):232–249, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. N. Schiper, V. Rahli, R. van Renesse, M. Bickford, and R. L. Constable. Developing correctly replicated databases using formal tools. In 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA, June 23-26, 2014, pages 395–406, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. P. Schnoebelen. Revisiting Ackermann-Hardness for Lossy Counter Machines and Reset Petri Nets. In Mathematical Foundations of Computer Science 2010, 35th International Symposium, MFCS 2010, Brno, Czech Republic, August 23-27, 2010. Proceedings, pages 616– 628, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. The Netty project. http://netty.io/.Google ScholarGoogle Scholar
  71. S. Torstendahl. Open telecom platform. Ericsson Review, 1, 1997.Google ScholarGoogle Scholar
  72. T. Tsuchiya and A. Schiper. Model checking of consensus algorithms. In 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), Beijing, China, October 10-12, 2007, pages 137–148. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. T. Tsuchiya and A. Schiper. Using bounded model checking to verify consensus algorithms. In G. Taubenfeld, editor, Distributed Computing, 22nd International Symposium, DISC 2008, Arcachon, France, September 22-24, 2008. Proceedings, pages 466–480, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. T. Tsuchiya and A. Schiper. Verification of consensus algorithms using satisfiability solving. Distributed Computing, 23(5-6):341–358, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. J. Widder and U. Schmid. The Theta-Model: Achieving synchrony without clocks. Distributed Computing, 22(1):29–47, Apr. 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In D. Grove and S. Blackburn, editors, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 357–368. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PSync: a partially synchronous language for fault-tolerant distributed algorithms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!