skip to main content

A multiparty session typing discipline for fault-tolerant event-driven distributed programming

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

This paper presents a formulation of multiparty session types (MPSTs) for practical fault-tolerant distributed programming. We tackle the challenges faced by session types in the context of distributed systems involving asynchronous and concurrent partial failures – such as supporting dynamic replacement of failed parties and retrying failed protocol segments in an ongoing multiparty session – in the presence of unreliable failure detection. Key to our approach is that we develop a novel model of event-driven concurrency for multiparty sessions. Inspired by real-world practices, it enables us to unify the session-typed handling of regular I/O events with failure handling and the combination of features needed to express practical fault-tolerant protocols. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions, which does not hold in traditional MPST systems.

To demonstrate its practicality, we implement our framework as a toolchain and runtime for Scala, and use it to specify and implement a session-typed version of the cluster management system of the industrial-strength Apache Spark data analytics framework. Our session-typed cluster manager composes with other vanilla Spark components to give a functioning Spark runtime; e.g., it can execute existing third-party Spark applications without code modification. A performance evaluation using the TPC-H benchmark shows our prototype implementation incurs an average overhead below 10%.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is the presentation video of our OOPSLA 2021 talk on our paper “A Multiparty Session Typing Discipline for Fault-Tolerant Event-Driven Distributed Programming”

References

  1. Martín Abadi, Luca Cardelli, Benjamin C. Pierce, and Gordon D. Plotkin. 1991. Dynamic Typing in a Statically Typed Language. ACM Trans. Program. Lang. Syst., 13, 2 (1991), 237–268. https://doi.org/10.1145/103135.103138 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Manuel Adameit, Kirstin Peters, and Uwe Nestmann. 2017. Session Types for Link Failures. In FORTE ’17. 10321, Springer, 1–16. isbn:978-3-319-60224-0 https://doi.org/10.1007/978-3-319-60225-7_1 Google ScholarGoogle ScholarCross RefCross Ref
  3. Davide Ancona. 2016. Behavioral Types in Programming Languages. Foundations and Trends in Programming Languages, 3, 2-3 (2016), 95–230. https://doi.org/10.1561/2500000031 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anindya Basu, Bernadette Charron-Bost, and Sam Toueg. 1996. Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. In 10th International Workshop on Distributed Algorithms (WDAG’96) (Lecture Notes in Computer Science). Springer, 105–122. isbn:978-3-540-70679-3 https://doi.org/10.1007/3-540-61769-8_8 Google ScholarGoogle ScholarCross RefCross Ref
  5. Mauricio Cano, Jaime Arias, and Jorge A. Pérez. 2017. Session-Based Concurrency, Reactively. In FORTE ’17 (Lecture Notes in Computer Science, Vol. 10321). Springer, 74–91. isbn:978-3-319-60224-0 https://doi.org/10.1007/978-3-319-60225-7_6 Google ScholarGoogle ScholarCross RefCross Ref
  6. Sara Capecchi, Elena Giachino, and Nobuko Yoshida. 2016. Global Escape in Multiparty Sessions. MSCS, 26, 2 (2016), 156–205. https://doi.org/10.1017/S0960129514000164 Google ScholarGoogle ScholarCross RefCross Ref
  7. Marco Carbone, Kohei Honda, and Nobuko Yoshida. 2008. Structured Interactional Exceptions in Session Types. In CONCUR ’08 (LNCS, Vol. 5201). Springer, 402–417. isbn:978-3-540-85360-2 https://doi.org/10.1007/978-3-540-85361-9 Google ScholarGoogle ScholarCross RefCross Ref
  8. David Castro-Perez, Raymond Hu, Sung-Shik Jongmans, Nicholas Ng, and Nobuko Yoshida. 2019. Distributed programming using role-parametric session types in go: statically-typed endpoint APIs for dynamically-instantiated communication structures. Proc. ACM Program. Lang., 3, POPL (2019), 29:1–29:30. https://doi.org/10.1145/3290342 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable Failure Detectors for Reliable Distributed Systems. J. ACM, 43, 2 (1996), 225–267. https://doi.org/10.1145/226643.226647 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bernadette Charron-Bost and André Schiper. 2009. The Heard-Of Model: computing in Distributed Systems with Benign Faults. Distributed Computing, 22, 1 (2009), 49–71. https://doi.org/10.1007/s00446-009-0084-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nathan Chong, Byron Cook, Jonathan Eidelman, Konstantinos Kallas, Kareem Khazem, Felipe R. Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle. 2021. Code-level model checking in the software development workflow at Amazon Web Services. Softw. Pract. Exp., 51, 4 (2021), 772–797. https://doi.org/10.1002/spe.2949 Google ScholarGoogle ScholarCross RefCross Ref
  12. Mario Coppo, Mariangiola Dezani-Ciancaglini, Nobuko Yoshida, and Luca Padovani. 2016. Global progress for dynamically interleaved multiparty sessions. MSCS, 26, 2 (2016), 238–302. https://doi.org/10.1017/S0960129514000188 Google ScholarGoogle ScholarCross RefCross Ref
  13. Romain Demangeon and Kohei Honda. 2012. Nested Protocols in Session Types. In CONCUR ’12 (Lecture Notes in Computer Science, Vol. 7454). Springer, 272–286. isbn:978-3-642-32939-5 https://doi.org/10.1007/978-3-642-32940-1_20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Romain Demangeon, Kohei Honda, Raymond Hu, Rumyana Neykova, and Nobuko Yoshida. 2015. Practical Interruptible Conversations. Formal Methods in System Design, 46, 3 (2015), 197–225. https://doi.org/10.1007/s10703-014-0218-8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pierre-Malo Deniélou, Nobuko Yoshida, Andi Bejleri, and Raymond Hu. 2012. Parameterised Multiparty Session Types. Log. Methods Comput. Sci., 8, 4 (2012), https://doi.org/10.2168/LMCS-8(4:6)2012 Google ScholarGoogle ScholarCross RefCross Ref
  16. Cezara Dragoi, Thomas Henzinger, and Damien Zufferey. 2016. PSync: A Partially Synchronous Language for Fault-tolerant Distributed Algorithms. In POPL ’16. ACM, 400–415. isbn:978-1-4503-3549-2 https://doi.org/10.1145/2837614.2837650 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of Distributed Consensus with One Faulty Process. J. ACM, 32, 2 (1985), 374–382. https://doi.org/10.1145/3149.214121 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Simon Fowler, Sam Lindley, J. Garrett Morris, and Sára Decova. 2019. Exceptional asynchronous session types: session types without tiers. PACMPL, 3, POPL (2019), 28:1–28:29. https://doi.org/10.1145/3290341 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 2017. Behavioural Types: from Theory to Tools, Simon Gay and Antonio Ravara (Eds.). River Publishers. http://eprints.gla.ac.uk/146884/Google ScholarGoogle Scholar
  20. Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R. Lorch, Bryan Parno, Michael L. Roberts, Srinath T. V. Setty, and Brian Zill. 2015. IronFleet: proving practical distributed systems correct. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015, Ethan L. Miller and Steven Hand (Eds.). ACM, 1–17. https://doi.org/10.1145/2815400.2815428 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI’11).Google ScholarGoogle Scholar
  22. Kohei Honda, Nobuko Yoshida, and Marco Carbone. 2016. Multiparty Asynchronous Session Types. J. ACM, 63, 1 (2016), 9:1–9:67. https://doi.org/10.1145/2827695 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Raymond Hu, Dimitrios Kouzapas, Olivier Pernet, Nobuko Yoshida, and Kohei Honda. 2010. Type-Safe Eventful Sessions in Java. In ECOOP ’10 (Lecture Notes in Computer Science, Vol. 6183). Springer, 329–353. isbn:978-3-642-14106-5 https://doi.org/10.1007/978-3-642-14107-2_16 Google ScholarGoogle ScholarCross RefCross Ref
  24. Raymond Hu and Nobuko Yoshida. 2016. Hybrid Session Verification Through Endpoint API Generation. In FASE ’16 (LNCS, Vol. 9633). Springer, 401–418. isbn:978-3-662-49664-0 https://doi.org/10.1007/978-3-662-49665-7 Google ScholarGoogle ScholarCross RefCross Ref
  25. Raymond Hu and Nobuko Yoshida. 2017. Explicit Connection Actions in Multiparty Session Types. In FASE ’17 (Lecture Notes in Computer Science, Vol. 10202). Springer, 116–133. isbn:978-3-662-54493-8 https://doi.org/10.1007/978-3-662-54494-5_7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Patrick Hunt. 2010. ZooKeeper: Wait-Free Coordination for Internet-Scale Systems.. In USENIX ’10. USENIX Association.Google ScholarGoogle Scholar
  27. Hans Hüttel, Ivan Lanese, Vasco T. Vasconcelos, Luís Caires, Marco Carbone, Pierre-Malo Deniélou, Dimitris Mostrous, Luca Padovani, António Ravara, Emilio Tuosto, Hugo Torres Vieira, and Gianluigi Zavattaro. 2016. Foundations of Session Types and Behavioural Contracts. ACM Comput. Surv., 49, 1 (2016), 3:1–3:36. https://doi.org/10.1145/2873052 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Charles Edwin Killian, James W. Anderson, Ryan Braud, Ranjit Jhala, and Amin Vahdat. 2007. Mace: language support for building distributed systems. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 179–188. https://doi.org/10.1145/1250734.1250755 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Igor V. Konnov, Marijana Lazic, Helmut Veith, and Josef Widder. 2017. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 719–734. http://dl.acm.org/citation.cfm?id=3009860Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Haojun Ma, Aman Goel, Jean-Baptiste Jeannin, Manos Kapritsos, Baris Kasikci, and Karem A. Sakallah. 2019. I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19). Association for Computing Machinery, New York, NY, USA. 370–384. isbn:9781450368735 https://doi.org/10.1145/3341301.3359651 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rumyana Neykova, Raymond Hu, Nobuko Yoshida, and Fahd Abdeljallal. 2018. A session type provider: compile-time API generation of distributed protocols with refinements in F#. In International Conference on Compiler Construction, CC 2018, Christophe Dubach and Jingling Xue (Eds.). ACM, 128–138. https://doi.org/10.1145/3178372.3179495 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rumyana Neykova and Nobuko Yoshida. 2017. Let it recover: multiparty protocol-induced recovery. In Proceedings of the 26th International Conference on Compiler Construction, Austin, TX, USA, February 5-6, 2017, Peng Wu and Sebastian Hack (Eds.). ACM, 98–108. https://doi.org/10.1145/3033019.3033031 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: safety verification by interactive generalization. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery Berger (Eds.). ACM, 614–630. https://doi.org/10.1145/2908080.2908118 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Luca Padovani. 2017. A simple library implementation of binary sessions. J. Funct. Program., 27 (2017), e4. https://doi.org/10.1017/S0956796816000289 Google ScholarGoogle ScholarCross RefCross Ref
  35. Ruzica Piskac, Leonardo Mendonça de Moura, and Nikolaj Bjørner. 2010. Deciding Effectively Propositional Logic Using DPLL and Substitution Sets. J. Autom. Reason., 44, 4 (2010), 401–424. https://doi.org/10.1007/s10817-009-9161-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alceste Scalas and Nobuko Yoshida. 2019. Less is more: multiparty session types revisited. PACMPL, 3, POPL (2019), 30:1–30:29. https://doi.org/10.1145/3290343 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ilya Sergey, James R. Wilcox, and Zachary Tatlock. 2018. Programming and proving with distributed protocols. Proc. ACM Program. Lang., 2, POPL (2018), 28:1–28:30. https://doi.org/10.1145/3158116 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Marcelo Taube, Giuliano Losa, Kenneth L. McMillan, Oded Padon, Mooly Sagiv, Sharon Shoham, James R. Wilcox, and Doug Woos. 2018. Modularity for decidability of deductive verification with applications to distributed systems. In PLDI ’18, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 662–677. https://doi.org/10.1145/3192366.3192414 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O’Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In ACM Symposium on Cloud Computing, SOCC ’13. ACM, 5:1–5:16. isbn:978-1-4503-2428-1 https://doi.org/10.1145/2523616.2523633 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Malte Viering, Tzu-Chun Chen, Patrick Eugster, Raymond Hu, and Lukasz Ziarek. 2018. A Typing Discipline for Statically Verified Crash Failure Handling in Distributed Systems. In ESOP ’18 (Lecture Notes in Computer Science, Vol. 10801). Springer, 799–826. isbn:978-3-319-89883-4 https://doi.org/10.1007/978-3-319-89884-1_28 Google ScholarGoogle ScholarCross RefCross Ref
  41. Klaus von Gleissenthall, Rami Gökhan Kici, Alexander Bakst, Deian Stefan, and Ranjit Jhala. 2019. Pretend synchrony: synchronous verification of asynchronous distributed programs. Proc. ACM Program. Lang., 3, POPL (2019), 59:1–59:30. https://doi.org/10.1145/3290372 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas E. Anderson. 2015. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. In PLDI ’15. ACM, 357–368. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2737924.2737958 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Junfeng Yang, Tisheng Chen, Ming Wu, Zhilei Xu, Xuezheng Liu, Haoxiang Lin, Mao Yang, Fan Long, Lintao Zhang, and Lidong Zhou. 2009. MODIST: Transparent Model Checking of Unmodified Distributed Systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, April 22-24, 2009, Boston, MA, USA, Jennifer Rexford and Emin Gün Sirer (Eds.). USENIX Association, 213–228. http://www.usenix.org/events/nsdi09/tech/full_papers/yang/yang.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  44. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI ’12. 15–28.Google ScholarGoogle Scholar
  45. Fangyi Zhou, Francisco Ferreira, Raymond Hu, Rumyana Neykova, and Nobuko Yoshida. 2020. Statically verified refinements for multiparty protocols. Proc. ACM Program. Lang., 4, OOPSLA (2020), 148:1–148:30. https://doi.org/10.1145/3428216 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A multiparty session typing discipline for fault-tolerant event-driven distributed programming

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!