Abstract
Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards.
We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.
- P. A. Abdulla, A. Bouajjani, and B. Jonsson. On-the-fly analysis of systems with unbounded, lossy FIFO channels. In CAV, LNCS, pages 305–318, 1998. Google Scholar
Digital Library
- F. Alberti, S. Ghilardi, and E. Pagani. Counting constraints in flat array fragments. In IJCAR, volume 9706 of LNCS, pages 65–81, 2016. Google Scholar
Digital Library
- K. Apt and D. Kozen. Limits for automatic verification of finite-state concurrent systems. IPL, 15:307–309, 1986. Google Scholar
Digital Library
- M. F. Atig, A. Bouajjani, M. Emmi, and A. Lal. Detecting fair nontermination in multithreaded programs. In CAV, pages 210–226, 2012. Google Scholar
Digital Library
- C. Baier and J.-P. Katoen. Principles of model checking. MIT Press, 2008. Google Scholar
Digital Library
- S. Bardin, A. Finkel, J. Leroux, and L. Petrucci. Fast: acceleration from theory to practice. STTT, 10(5):401–424, 2008. Google Scholar
Digital Library
- M. Biely, P. Delgado, Z. Milosevic, and A. Schiper. Distal: a framework for implementing fault-tolerant distributed algorithms. In DSN, pages 1–8, 2013. Google Scholar
Digital Library
- A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. Electronic Notes in Theoretical Computer Science, 66(2): 160–177, 2002.Google Scholar
Cross Ref
- R. Bloem, S. Jacobs, A. Khalimov, I. Konnov, S. Rubin, H. Veith, and J. Widder. Decidability of Parameterized Verification. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, 2015. Google Scholar
Digital Library
- A. Bouajjani, P. Habermehl, and T. Vojnar. Abstract regular model checking. In CAV, LNCS, pages 372–386, 2004.Google Scholar
- G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. J. ACM, 32(4):824–840, 1985. Google Scholar
Digital Library
- F. V. Brasileiro, F. Greve, A. Mostéfaoui, and M. Raynal. Consensus in one communication step. In PaCT, volume 2127 of LNCS, pages 42–50, 2001. Google Scholar
Digital Library
- E. R. Canfield and S. G. Williamson. A loop-free algorithm for generating the linear extensions of a poset. Order, 12(1):57–75, 1995.Google Scholar
Cross Ref
- T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225–267, 1996. Google Scholar
Digital Library
- B. Charron-Bost and S. Merz. Formal verification of a consensus algorithm in the heard-of model. IJSI, 3(2–3):273–303, 2009.Google Scholar
- K. Chaudhuri, D. Doligez, L. Lamport, and S. Merz. Verifying safety properties with the TLA+ proof system. In IJCAR, volume 6173 of LNCS, pages 142–148, 2010. Google Scholar
Digital Library
- E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999. Google Scholar
Digital Library
- E. Clarke, M. Talupur, and H. Veith. Proving Ptolemy right: the environment abstraction framework for model checking concurrent systems. In TACAS’08/ETAPS’08, pages 33–47. Springer, 2008. Google Scholar
Digital Library
- E. Cohen and L. Lamport. Reduction in TLA. In CONCUR, volume 1466 of LNCS, pages 317–331, 1998. Google Scholar
Digital Library
- L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, volume 1579 of LNCS, pages 337–340. 2008. Google Scholar
Digital Library
- D. Dobre and N. Suri. One-step consensus with zero-degradation. In DSN, pages 137–146, 2006. Google Scholar
Digital Library
- T. W. Doeppner. Parallel program correctness through refinement. In POPL, pages 155–169, 1977. Google Scholar
Digital Library
- C. Drăgoi, T. A. Henzinger, and D. Zufferey. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL, pages 400–415, 2016. Google Scholar
Digital Library
- C. Drăgoi, T. A. Henzinger, H. Veith, J. Widder, and D. Zufferey. A logic-based framework for verifying consensus algorithms. In VMCAI, volume 8318 of LNCS, pages 161–181, 2014. Google Scholar
Digital Library
- T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, pages 2–15, 2009. Google Scholar
Digital Library
- E. Emerson and K. Namjoshi. Reasoning about rings. In POPL, pages 85–94, 1995. Google Scholar
Digital Library
- E. A. Emerson and V. Kahlon. Model checking guarded protocols. In LICS, pages 361–370. IEEE, 2003.Google Scholar
- J. Esparza, A. Finkel, and R. Mayr. On the verification of broadcast protocols. In LICS, pages 352–359. IEEE Computer Society, 1999. Google Scholar
Digital Library
- K. Etessami, M. Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Inf. Comput., 179(2):279–295, 2002. Google Scholar
Digital Library
- Y. Fang, N. Piterman, A. Pnueli, and L. D. Zuck. Liveness with invisible ranking. STTT, 8(3):261–279, 2006.Google Scholar
Digital Library
- A. Farzan, Z. Kincaid, and A. Podelski. Proving liveness of parameterized programs. In LICS, pages 185–196, 2016. Google Scholar
Digital Library
- M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, 1985. Google Scholar
Digital Library
- D. Fisman, O. Kupferman, and Y. Lustig. On verifying fault tolerance of distributed protocols. In TACAS, volume 4963 of LNCS, pages 315–331. Springer, 2008. Google Scholar
Digital Library
- C. Flanagan, S. N. Freund, and S. Qadeer. Exploiting purity for atomicity. IEEE Trans. Softw. Eng., 31(4):275–291, 2005. Google Scholar
Digital Library
- S. M. German and A. P. Sistla. Reasoning about systems with many processes. J. ACM, 39:675–735, 1992. Google Scholar
Digital Library
- A. Gmeiner, I. Konnov, U. Schmid, H. Veith, and J. Widder. Tutorial on parameterized model checking of fault-tolerant distributed algorithms. In Formal Methods for Executable Software Models, LNCS, pages 122–171. Springer, 2014. Google Scholar
Digital Library
- R. Guerraoui. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing, 15 (1):17–25, 2002. Google Scholar
Digital Library
- C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. T. V. Setty, and B. Zill. Ironfleet: proving practical distributed systems correct. In SOSP, pages 1–17, 2015. Google Scholar
Digital Library
- G. Holzmann. The SPIN Model Checker. Addison-Wesley, 2003.Google Scholar
- A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.Google Scholar
Cross Ref
- C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: language support for building distributed systems. In ACM SIGPLAN PLDI, pages 179–188, 2007. Google Scholar
Digital Library
- I. Konnov, H. Veith, and J. Widder. SMT and POR beat counter abstraction: Parameterized model checking of threshold-based distributed algorithms. In CAV (Part I), volume 9206 of LNCS, pages 85–102, 2015.Google Scholar
Cross Ref
- I. Konnov, M. Lazi´c, H. Veith, and J. Widder. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. CoRR, abs/1608.05327, 2016. Google Scholar
Digital Library
- I. Konnov, H. Veith, and J. Widder. On the completeness of bounded model checking for threshold-based distributed algorithms: Reachability. Information and Computation, 2016. Accepted manuscript available online: 10-MAR-2016. Google Scholar
Digital Library
- I. Konnov, H. Veith, and J. Widder. What you always wanted to know about model checking of fault-tolerant distributed algorithms. In PSI 2015, Revised Selected Papers, volume 9609 of LNCS, pages 6–21. Springer, 2016.Google Scholar
- D. Kroening, J. Ouaknine, O. Strichman, T. Wahl, and J. Worrell. Linear completeness thresholds for bounded model checking. In CAV, volume 6806 of LNCS, pages 557–572, 2011. Google Scholar
Digital Library
- L. Lamport and F. B. Schneider. Pretending atomicity. Technical Report 44, SRC, 1989.Google Scholar
- M. Lesani, C. J. Bell, and A. Chlipala. Chapar: certified causally consistent distributed key-value stores. In POPL, pages 357–370, 2016. Google Scholar
Digital Library
- P. Lincoln and J. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In FTCS, pages 402–411, 1993.Google Scholar
Cross Ref
- R. J. Lipton. Reduction: A method of proving properties of parallel programs. Commun. ACM, 18(12):717–721, 1975. Google Scholar
Digital Library
- B. D. Lubachevsky. An approach to automating the verification of compact parallel coordination programs. I. Acta Informatica, 21(2): 125–169, 1984. Google Scholar
Digital Library
- A. Mostéfaoui, E. Mourgaya, P. R. Parvédy, and M. Raynal. Evaluating the condition-based approach to solve consensus. In DSN, pages 541– 550, 2003.Google Scholar
Cross Ref
- Netflix. 5 lessons we have learned using AWS. 2010.Google Scholar
- retrieved on Nov. 7, 2016. http://techblog.netflix.com/2010/ 12/5-lessons-weve-learned-using-aws.html.Google Scholar
- D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In USENIX ATC, pages 305–320, 2014. Google Scholar
Digital Library
- O. Padon, K. L. McMillan, A. Panda, M. Sagiv, and S. Shoham. Ivy: safety verification by interactive generalization. In PLDI, pages 614– 630, 2016. Google Scholar
Digital Library
- M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, 1980. Google Scholar
Digital Library
- S. Peluso, A. Turcu, R. Palmieri, G. Losa, and B. Ravindran. Making fast consensus generally faster. In DSN, pages 156–167, 2016.Google Scholar
Cross Ref
- A. Pnueli and E. Shahar. Liveness and acceleration in parameterized verification. In CAV, LNCS, pages 328–343, 2000. Google Scholar
Digital Library
- A. Pnueli, J. Xu, and L. Zuck. Liveness with (0,1,∞)counter abstraction. In CAV, volume 2404 of LNCS, pages 93–111. 2002. Google Scholar
Digital Library
- V. Rahli, D. Guaspari, M. Bickford, and R. L. Constable. Formal specification, verification, and implementation of fault-tolerant systems using EventML. ECEASST, 72, 2015.Google Scholar
- M. Raynal. A case study of agreement problems in distributed systems: Non-blocking atomic commitment. In HASE, pages 209–214, 1997. Google Scholar
Digital Library
- V. Schuppan and A. Biere. Liveness checking as safety checking for infinite state spaces. Electronic Notes in Theoretical Computer Science, 149(1):79–96, 2006. Google Scholar
Digital Library
- Y. J. Song and R. van Renesse. Bosco: One-step Byzantine asynchronous consensus. In DISC, volume 5218 of LNCS, pages 438–450, 2008. Google Scholar
Digital Library
- T. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comp., 2:80–94, 1987.Google Scholar
Digital Library
- TLA. TLA+ toolbox. http://research.microsoft.com/en-us/ um/people/lamport/tla/tools.html.Google Scholar
- M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In LICS, pages 322–331, 1986.Google Scholar
- K. von Gleissenthall, N. Bjørner, and A. Rybalchenko. Cardinalities and universal quantifiers for verifying parameterized systems. In PLDI, pages 599–613, 2016. Google Scholar
Digital Library
- J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368, 2015. Google Scholar
Digital Library
Index Terms
A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms
Recommendations
A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming LanguagesDistributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design ...
Brief announcement: parameterized model checking of fault-tolerant distributed algorithms by abstraction
PODC '13: Proceedings of the 2013 ACM symposium on Principles of distributed computingWe introduce an automated method for parameterized verification of fault-tolerant distribed algorithms. It rests on a novel parametric interval abstraction (PIA) technique, which works for systems with multiple parameters, for instance, where n and t ...
Verifying safety of synchronous fault-tolerant algorithms by bounded model checking
AbstractThreshold automata are a formalism introduced for modeling, verification, and synthesis of fault-tolerant distributed algorithms for asynchronous systems, that is, in interleaving semantics. Owing to well-known limitations of what can be achieved ...







Comments