skip to main content
research-article

A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms

Published:01 January 2017Publication History
Skip Abstract Section

Abstract

Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards.

We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.

References

  1. P. A. Abdulla, A. Bouajjani, and B. Jonsson. On-the-fly analysis of systems with unbounded, lossy FIFO channels. In CAV, LNCS, pages 305–318, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Alberti, S. Ghilardi, and E. Pagani. Counting constraints in flat array fragments. In IJCAR, volume 9706 of LNCS, pages 65–81, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Apt and D. Kozen. Limits for automatic verification of finite-state concurrent systems. IPL, 15:307–309, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. F. Atig, A. Bouajjani, M. Emmi, and A. Lal. Detecting fair nontermination in multithreaded programs. In CAV, pages 210–226, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Baier and J.-P. Katoen. Principles of model checking. MIT Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Bardin, A. Finkel, J. Leroux, and L. Petrucci. Fast: acceleration from theory to practice. STTT, 10(5):401–424, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Biely, P. Delgado, Z. Milosevic, and A. Schiper. Distal: a framework for implementing fault-tolerant distributed algorithms. In DSN, pages 1–8, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. Electronic Notes in Theoretical Computer Science, 66(2): 160–177, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Bloem, S. Jacobs, A. Khalimov, I. Konnov, S. Rubin, H. Veith, and J. Widder. Decidability of Parameterized Verification. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Bouajjani, P. Habermehl, and T. Vojnar. Abstract regular model checking. In CAV, LNCS, pages 372–386, 2004.Google ScholarGoogle Scholar
  11. G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. J. ACM, 32(4):824–840, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. V. Brasileiro, F. Greve, A. Mostéfaoui, and M. Raynal. Consensus in one communication step. In PaCT, volume 2127 of LNCS, pages 42–50, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. R. Canfield and S. G. Williamson. A loop-free algorithm for generating the linear extensions of a poset. Order, 12(1):57–75, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  14. T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225–267, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Charron-Bost and S. Merz. Formal verification of a consensus algorithm in the heard-of model. IJSI, 3(2–3):273–303, 2009.Google ScholarGoogle Scholar
  16. K. Chaudhuri, D. Doligez, L. Lamport, and S. Merz. Verifying safety properties with the TLA+ proof system. In IJCAR, volume 6173 of LNCS, pages 142–148, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Clarke, M. Talupur, and H. Veith. Proving Ptolemy right: the environment abstraction framework for model checking concurrent systems. In TACAS’08/ETAPS’08, pages 33–47. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Cohen and L. Lamport. Reduction in TLA. In CONCUR, volume 1466 of LNCS, pages 317–331, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, volume 1579 of LNCS, pages 337–340. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Dobre and N. Suri. One-step consensus with zero-degradation. In DSN, pages 137–146, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. W. Doeppner. Parallel program correctness through refinement. In POPL, pages 155–169, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Drăgoi, T. A. Henzinger, and D. Zufferey. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL, pages 400–415, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Drăgoi, T. A. Henzinger, H. Veith, J. Widder, and D. Zufferey. A logic-based framework for verifying consensus algorithms. In VMCAI, volume 8318 of LNCS, pages 161–181, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, pages 2–15, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Emerson and K. Namjoshi. Reasoning about rings. In POPL, pages 85–94, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. A. Emerson and V. Kahlon. Model checking guarded protocols. In LICS, pages 361–370. IEEE, 2003.Google ScholarGoogle Scholar
  28. J. Esparza, A. Finkel, and R. Mayr. On the verification of broadcast protocols. In LICS, pages 352–359. IEEE Computer Society, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Etessami, M. Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Inf. Comput., 179(2):279–295, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Fang, N. Piterman, A. Pnueli, and L. D. Zuck. Liveness with invisible ranking. STTT, 8(3):261–279, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Farzan, Z. Kincaid, and A. Podelski. Proving liveness of parameterized programs. In LICS, pages 185–196, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Fisman, O. Kupferman, and Y. Lustig. On verifying fault tolerance of distributed protocols. In TACAS, volume 4963 of LNCS, pages 315–331. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Flanagan, S. N. Freund, and S. Qadeer. Exploiting purity for atomicity. IEEE Trans. Softw. Eng., 31(4):275–291, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. M. German and A. P. Sistla. Reasoning about systems with many processes. J. ACM, 39:675–735, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Gmeiner, I. Konnov, U. Schmid, H. Veith, and J. Widder. Tutorial on parameterized model checking of fault-tolerant distributed algorithms. In Formal Methods for Executable Software Models, LNCS, pages 122–171. Springer, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Guerraoui. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing, 15 (1):17–25, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. T. V. Setty, and B. Zill. Ironfleet: proving practical distributed systems correct. In SOSP, pages 1–17, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. Holzmann. The SPIN Model Checker. Addison-Wesley, 2003.Google ScholarGoogle Scholar
  40. A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  41. C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: language support for building distributed systems. In ACM SIGPLAN PLDI, pages 179–188, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. I. Konnov, H. Veith, and J. Widder. SMT and POR beat counter abstraction: Parameterized model checking of threshold-based distributed algorithms. In CAV (Part I), volume 9206 of LNCS, pages 85–102, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  43. I. Konnov, M. Lazi´c, H. Veith, and J. Widder. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. CoRR, abs/1608.05327, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. I. Konnov, H. Veith, and J. Widder. On the completeness of bounded model checking for threshold-based distributed algorithms: Reachability. Information and Computation, 2016. Accepted manuscript available online: 10-MAR-2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. I. Konnov, H. Veith, and J. Widder. What you always wanted to know about model checking of fault-tolerant distributed algorithms. In PSI 2015, Revised Selected Papers, volume 9609 of LNCS, pages 6–21. Springer, 2016.Google ScholarGoogle Scholar
  46. D. Kroening, J. Ouaknine, O. Strichman, T. Wahl, and J. Worrell. Linear completeness thresholds for bounded model checking. In CAV, volume 6806 of LNCS, pages 557–572, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. L. Lamport and F. B. Schneider. Pretending atomicity. Technical Report 44, SRC, 1989.Google ScholarGoogle Scholar
  48. M. Lesani, C. J. Bell, and A. Chlipala. Chapar: certified causally consistent distributed key-value stores. In POPL, pages 357–370, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. Lincoln and J. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In FTCS, pages 402–411, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  50. R. J. Lipton. Reduction: A method of proving properties of parallel programs. Commun. ACM, 18(12):717–721, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. B. D. Lubachevsky. An approach to automating the verification of compact parallel coordination programs. I. Acta Informatica, 21(2): 125–169, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Mostéfaoui, E. Mourgaya, P. R. Parvédy, and M. Raynal. Evaluating the condition-based approach to solve consensus. In DSN, pages 541– 550, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  53. Netflix. 5 lessons we have learned using AWS. 2010.Google ScholarGoogle Scholar
  54. retrieved on Nov. 7, 2016. http://techblog.netflix.com/2010/ 12/5-lessons-weve-learned-using-aws.html.Google ScholarGoogle Scholar
  55. D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In USENIX ATC, pages 305–320, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. O. Padon, K. L. McMillan, A. Panda, M. Sagiv, and S. Shoham. Ivy: safety verification by interactive generalization. In PLDI, pages 614– 630, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. S. Peluso, A. Turcu, R. Palmieri, G. Losa, and B. Ravindran. Making fast consensus generally faster. In DSN, pages 156–167, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  59. A. Pnueli and E. Shahar. Liveness and acceleration in parameterized verification. In CAV, LNCS, pages 328–343, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. A. Pnueli, J. Xu, and L. Zuck. Liveness with (0,1,∞)counter abstraction. In CAV, volume 2404 of LNCS, pages 93–111. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. V. Rahli, D. Guaspari, M. Bickford, and R. L. Constable. Formal specification, verification, and implementation of fault-tolerant systems using EventML. ECEASST, 72, 2015.Google ScholarGoogle Scholar
  62. M. Raynal. A case study of agreement problems in distributed systems: Non-blocking atomic commitment. In HASE, pages 209–214, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. V. Schuppan and A. Biere. Liveness checking as safety checking for infinite state spaces. Electronic Notes in Theoretical Computer Science, 149(1):79–96, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Y. J. Song and R. van Renesse. Bosco: One-step Byzantine asynchronous consensus. In DISC, volume 5218 of LNCS, pages 438–450, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. T. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comp., 2:80–94, 1987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. TLA. TLA+ toolbox. http://research.microsoft.com/en-us/ um/people/lamport/tla/tools.html.Google ScholarGoogle Scholar
  67. M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In LICS, pages 322–331, 1986.Google ScholarGoogle Scholar
  68. K. von Gleissenthall, N. Bjørner, and A. Rybalchenko. Cardinalities and universal quantifiers for verifying parameterized systems. In PLDI, pages 599–613, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!