Abstract
This paper presents a formulation of multiparty session types (MPSTs) for practical fault-tolerant distributed programming. We tackle the challenges faced by session types in the context of distributed systems involving asynchronous and concurrent partial failures – such as supporting dynamic replacement of failed parties and retrying failed protocol segments in an ongoing multiparty session – in the presence of unreliable failure detection. Key to our approach is that we develop a novel model of event-driven concurrency for multiparty sessions. Inspired by real-world practices, it enables us to unify the session-typed handling of regular I/O events with failure handling and the combination of features needed to express practical fault-tolerant protocols. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions, which does not hold in traditional MPST systems.
To demonstrate its practicality, we implement our framework as a toolchain and runtime for Scala, and use it to specify and implement a session-typed version of the cluster management system of the industrial-strength Apache Spark data analytics framework. Our session-typed cluster manager composes with other vanilla Spark components to give a functioning Spark runtime; e.g., it can execute existing third-party Spark applications without code modification. A performance evaluation using the TPC-H benchmark shows our prototype implementation incurs an average overhead below 10%.
Supplemental Material
- Martín Abadi, Luca Cardelli, Benjamin C. Pierce, and Gordon D. Plotkin. 1991. Dynamic Typing in a Statically Typed Language. ACM Trans. Program. Lang. Syst., 13, 2 (1991), 237–268. https://doi.org/10.1145/103135.103138 Google Scholar
Digital Library
- Manuel Adameit, Kirstin Peters, and Uwe Nestmann. 2017. Session Types for Link Failures. In FORTE ’17. 10321, Springer, 1–16. isbn:978-3-319-60224-0 https://doi.org/10.1007/978-3-319-60225-7_1 Google Scholar
Cross Ref
- Davide Ancona. 2016. Behavioral Types in Programming Languages. Foundations and Trends in Programming Languages, 3, 2-3 (2016), 95–230. https://doi.org/10.1561/2500000031 Google Scholar
Digital Library
- Anindya Basu, Bernadette Charron-Bost, and Sam Toueg. 1996. Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. In 10th International Workshop on Distributed Algorithms (WDAG’96) (Lecture Notes in Computer Science). Springer, 105–122. isbn:978-3-540-70679-3 https://doi.org/10.1007/3-540-61769-8_8 Google Scholar
Cross Ref
- Mauricio Cano, Jaime Arias, and Jorge A. Pérez. 2017. Session-Based Concurrency, Reactively. In FORTE ’17 (Lecture Notes in Computer Science, Vol. 10321). Springer, 74–91. isbn:978-3-319-60224-0 https://doi.org/10.1007/978-3-319-60225-7_6 Google Scholar
Cross Ref
- Sara Capecchi, Elena Giachino, and Nobuko Yoshida. 2016. Global Escape in Multiparty Sessions. MSCS, 26, 2 (2016), 156–205. https://doi.org/10.1017/S0960129514000164 Google Scholar
Cross Ref
- Marco Carbone, Kohei Honda, and Nobuko Yoshida. 2008. Structured Interactional Exceptions in Session Types. In CONCUR ’08 (LNCS, Vol. 5201). Springer, 402–417. isbn:978-3-540-85360-2 https://doi.org/10.1007/978-3-540-85361-9 Google Scholar
Cross Ref
- David Castro-Perez, Raymond Hu, Sung-Shik Jongmans, Nicholas Ng, and Nobuko Yoshida. 2019. Distributed programming using role-parametric session types in go: statically-typed endpoint APIs for dynamically-instantiated communication structures. Proc. ACM Program. Lang., 3, POPL (2019), 29:1–29:30. https://doi.org/10.1145/3290342 Google Scholar
Digital Library
- Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable Failure Detectors for Reliable Distributed Systems. J. ACM, 43, 2 (1996), 225–267. https://doi.org/10.1145/226643.226647 Google Scholar
Digital Library
- Bernadette Charron-Bost and André Schiper. 2009. The Heard-Of Model: computing in Distributed Systems with Benign Faults. Distributed Computing, 22, 1 (2009), 49–71. https://doi.org/10.1007/s00446-009-0084-6 Google Scholar
Digital Library
- Nathan Chong, Byron Cook, Jonathan Eidelman, Konstantinos Kallas, Kareem Khazem, Felipe R. Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle. 2021. Code-level model checking in the software development workflow at Amazon Web Services. Softw. Pract. Exp., 51, 4 (2021), 772–797. https://doi.org/10.1002/spe.2949 Google Scholar
Cross Ref
- Mario Coppo, Mariangiola Dezani-Ciancaglini, Nobuko Yoshida, and Luca Padovani. 2016. Global progress for dynamically interleaved multiparty sessions. MSCS, 26, 2 (2016), 238–302. https://doi.org/10.1017/S0960129514000188 Google Scholar
Cross Ref
- Romain Demangeon and Kohei Honda. 2012. Nested Protocols in Session Types. In CONCUR ’12 (Lecture Notes in Computer Science, Vol. 7454). Springer, 272–286. isbn:978-3-642-32939-5 https://doi.org/10.1007/978-3-642-32940-1_20 Google Scholar
Digital Library
- Romain Demangeon, Kohei Honda, Raymond Hu, Rumyana Neykova, and Nobuko Yoshida. 2015. Practical Interruptible Conversations. Formal Methods in System Design, 46, 3 (2015), 197–225. https://doi.org/10.1007/s10703-014-0218-8 Google Scholar
Digital Library
- Pierre-Malo Deniélou, Nobuko Yoshida, Andi Bejleri, and Raymond Hu. 2012. Parameterised Multiparty Session Types. Log. Methods Comput. Sci., 8, 4 (2012), https://doi.org/10.2168/LMCS-8(4:6)2012 Google Scholar
Cross Ref
- Cezara Dragoi, Thomas Henzinger, and Damien Zufferey. 2016. PSync: A Partially Synchronous Language for Fault-tolerant Distributed Algorithms. In POPL ’16. ACM, 400–415. isbn:978-1-4503-3549-2 https://doi.org/10.1145/2837614.2837650 Google Scholar
Digital Library
- Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of Distributed Consensus with One Faulty Process. J. ACM, 32, 2 (1985), 374–382. https://doi.org/10.1145/3149.214121 Google Scholar
Digital Library
- Simon Fowler, Sam Lindley, J. Garrett Morris, and Sára Decova. 2019. Exceptional asynchronous session types: session types without tiers. PACMPL, 3, POPL (2019), 28:1–28:29. https://doi.org/10.1145/3290341 Google Scholar
Digital Library
- 2017. Behavioural Types: from Theory to Tools, Simon Gay and Antonio Ravara (Eds.). River Publishers. http://eprints.gla.ac.uk/146884/Google Scholar
- Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R. Lorch, Bryan Parno, Michael L. Roberts, Srinath T. V. Setty, and Brian Zill. 2015. IronFleet: proving practical distributed systems correct. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015, Ethan L. Miller and Steven Hand (Eds.). ACM, 1–17. https://doi.org/10.1145/2815400.2815428 Google Scholar
Digital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI’11).Google Scholar
- Kohei Honda, Nobuko Yoshida, and Marco Carbone. 2016. Multiparty Asynchronous Session Types. J. ACM, 63, 1 (2016), 9:1–9:67. https://doi.org/10.1145/2827695 Google Scholar
Digital Library
- Raymond Hu, Dimitrios Kouzapas, Olivier Pernet, Nobuko Yoshida, and Kohei Honda. 2010. Type-Safe Eventful Sessions in Java. In ECOOP ’10 (Lecture Notes in Computer Science, Vol. 6183). Springer, 329–353. isbn:978-3-642-14106-5 https://doi.org/10.1007/978-3-642-14107-2_16 Google Scholar
Cross Ref
- Raymond Hu and Nobuko Yoshida. 2016. Hybrid Session Verification Through Endpoint API Generation. In FASE ’16 (LNCS, Vol. 9633). Springer, 401–418. isbn:978-3-662-49664-0 https://doi.org/10.1007/978-3-662-49665-7 Google Scholar
Cross Ref
- Raymond Hu and Nobuko Yoshida. 2017. Explicit Connection Actions in Multiparty Session Types. In FASE ’17 (Lecture Notes in Computer Science, Vol. 10202). Springer, 116–133. isbn:978-3-662-54493-8 https://doi.org/10.1007/978-3-662-54494-5_7 Google Scholar
Digital Library
- Patrick Hunt. 2010. ZooKeeper: Wait-Free Coordination for Internet-Scale Systems.. In USENIX ’10. USENIX Association.Google Scholar
- Hans Hüttel, Ivan Lanese, Vasco T. Vasconcelos, Luís Caires, Marco Carbone, Pierre-Malo Deniélou, Dimitris Mostrous, Luca Padovani, António Ravara, Emilio Tuosto, Hugo Torres Vieira, and Gianluigi Zavattaro. 2016. Foundations of Session Types and Behavioural Contracts. ACM Comput. Surv., 49, 1 (2016), 3:1–3:36. https://doi.org/10.1145/2873052 Google Scholar
Digital Library
- Charles Edwin Killian, James W. Anderson, Ryan Braud, Ranjit Jhala, and Amin Vahdat. 2007. Mace: language support for building distributed systems. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 179–188. https://doi.org/10.1145/1250734.1250755 Google Scholar
Digital Library
- Igor V. Konnov, Marijana Lazic, Helmut Veith, and Josef Widder. 2017. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 719–734. http://dl.acm.org/citation.cfm?id=3009860Google Scholar
Digital Library
- Haojun Ma, Aman Goel, Jean-Baptiste Jeannin, Manos Kapritsos, Baris Kasikci, and Karem A. Sakallah. 2019. I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19). Association for Computing Machinery, New York, NY, USA. 370–384. isbn:9781450368735 https://doi.org/10.1145/3341301.3359651 Google Scholar
Digital Library
- Rumyana Neykova, Raymond Hu, Nobuko Yoshida, and Fahd Abdeljallal. 2018. A session type provider: compile-time API generation of distributed protocols with refinements in F#. In International Conference on Compiler Construction, CC 2018, Christophe Dubach and Jingling Xue (Eds.). ACM, 128–138. https://doi.org/10.1145/3178372.3179495 Google Scholar
Digital Library
- Rumyana Neykova and Nobuko Yoshida. 2017. Let it recover: multiparty protocol-induced recovery. In Proceedings of the 26th International Conference on Compiler Construction, Austin, TX, USA, February 5-6, 2017, Peng Wu and Sebastian Hack (Eds.). ACM, 98–108. https://doi.org/10.1145/3033019.3033031 Google Scholar
Digital Library
- Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: safety verification by interactive generalization. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery Berger (Eds.). ACM, 614–630. https://doi.org/10.1145/2908080.2908118 Google Scholar
Digital Library
- Luca Padovani. 2017. A simple library implementation of binary sessions. J. Funct. Program., 27 (2017), e4. https://doi.org/10.1017/S0956796816000289 Google Scholar
Cross Ref
- Ruzica Piskac, Leonardo Mendonça de Moura, and Nikolaj Bjørner. 2010. Deciding Effectively Propositional Logic Using DPLL and Substitution Sets. J. Autom. Reason., 44, 4 (2010), 401–424. https://doi.org/10.1007/s10817-009-9161-6 Google Scholar
Digital Library
- Alceste Scalas and Nobuko Yoshida. 2019. Less is more: multiparty session types revisited. PACMPL, 3, POPL (2019), 30:1–30:29. https://doi.org/10.1145/3290343 Google Scholar
Digital Library
- Ilya Sergey, James R. Wilcox, and Zachary Tatlock. 2018. Programming and proving with distributed protocols. Proc. ACM Program. Lang., 2, POPL (2018), 28:1–28:30. https://doi.org/10.1145/3158116 Google Scholar
Digital Library
- Marcelo Taube, Giuliano Losa, Kenneth L. McMillan, Oded Padon, Mooly Sagiv, Sharon Shoham, James R. Wilcox, and Doug Woos. 2018. Modularity for decidability of deductive verification with applications to distributed systems. In PLDI ’18, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 662–677. https://doi.org/10.1145/3192366.3192414 Google Scholar
Digital Library
- Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O’Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In ACM Symposium on Cloud Computing, SOCC ’13. ACM, 5:1–5:16. isbn:978-1-4503-2428-1 https://doi.org/10.1145/2523616.2523633 Google Scholar
Digital Library
- Malte Viering, Tzu-Chun Chen, Patrick Eugster, Raymond Hu, and Lukasz Ziarek. 2018. A Typing Discipline for Statically Verified Crash Failure Handling in Distributed Systems. In ESOP ’18 (Lecture Notes in Computer Science, Vol. 10801). Springer, 799–826. isbn:978-3-319-89883-4 https://doi.org/10.1007/978-3-319-89884-1_28 Google Scholar
Cross Ref
- Klaus von Gleissenthall, Rami Gökhan Kici, Alexander Bakst, Deian Stefan, and Ranjit Jhala. 2019. Pretend synchrony: synchronous verification of asynchronous distributed programs. Proc. ACM Program. Lang., 3, POPL (2019), 59:1–59:30. https://doi.org/10.1145/3290372 Google Scholar
Digital Library
- James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas E. Anderson. 2015. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. In PLDI ’15. ACM, 357–368. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2737924.2737958 Google Scholar
Digital Library
- Junfeng Yang, Tisheng Chen, Ming Wu, Zhilei Xu, Xuezheng Liu, Haoxiang Lin, Mao Yang, Fan Long, Lintao Zhang, and Lidong Zhou. 2009. MODIST: Transparent Model Checking of Unmodified Distributed Systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, April 22-24, 2009, Boston, MA, USA, Jennifer Rexford and Emin Gün Sirer (Eds.). USENIX Association, 213–228. http://www.usenix.org/events/nsdi09/tech/full_papers/yang/yang.pdfGoogle Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI ’12. 15–28.Google Scholar
- Fangyi Zhou, Francisco Ferreira, Raymond Hu, Rumyana Neykova, and Nobuko Yoshida. 2020. Statically verified refinements for multiparty protocols. Proc. ACM Program. Lang., 4, OOPSLA (2020), 148:1–148:30. https://doi.org/10.1145/3428216 Google Scholar
Digital Library
Index Terms
A multiparty session typing discipline for fault-tolerant event-driven distributed programming
Recommendations
Multiparty Asynchronous Session Types
Communication is a central elements in software development. As a potential typed foundation for structured communication-centered programming, session types have been studied over the past decade for a wide range of process calculi and programming ...
Precise Subtyping for Asynchronous Multiparty Sessions
Session subtyping is a cornerstone of refinement of communicating processes: a process implementing a session type (i.e., a communication protocol) T can be safely used whenever a process implementing one of its supertypes T′ is expected, in any context, ...
Precise subtyping for asynchronous multiparty sessions
Session subtyping is a cornerstone of refinement of communicating processes: a process implementing a session type (i.e., a communication protocol) T can be safely used whenever a process implementing one of its supertypes T′ is expected, in any context,...






Comments