Abstract
Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex to permit exhaustive testing. Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. Furthermore, Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden. To demonstrate Verdi's utility, we present the first mechanically checked proof of linearizability of the Raft state machine replication algorithm, as well as verified implementations of a primary-backup replication system and a key-value store. These verified systems provide similar performance to unverified equivalents.
- Amazon. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East Region, Apr. 2011. http://aws.amazon.com/ message/65648/.Google Scholar
- S. Bishop, M. Fairbairn, M. Norrish, P. Sewell, M. Smith, and K. Wansbrough. Engineering with logic: HOL specification and symbolic-evaluation testing for TCP implementations. In Proceedings of the 33rd ACM Symposium on Principles of Programming Languages (POPL), pages 55–66, Charleston, SC, Jan. 2006. Google Scholar
Digital Library
- T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the 26th ACM SIGACTSIGOPS Symposium on Principles of Distributed Computing (PODC), pages 398–407, Portland, OR, Aug. 2007. Google Scholar
Digital Library
- A. Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 234–245, San Jose, CA, June 2011. Google Scholar
Digital Library
- R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematics with The Nuprl Proof Development System. Prentice Hall, 1986. Google Scholar
Digital Library
- CoreOS. etcd: A highly-available key value store for shared configuration and service discovery, 2014.Google Scholar
- https://github.com/coreos/etcd.Google Scholar
- S. J. Garland and N. Lynch. Using I/O automata for developing distributed systems. In Foundations of Component-based Systems. Cambridge University Press, 2000. Google Scholar
Digital Library
- D. Geels, G. Altekar, P. Maniatis, T. Roscoe, and I. Stoica. Friday: Global comprehension for distributed replay. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 285–298, Cambridge, MA, Apr. 2007. Google Scholar
Digital Library
- A. Guha, M. Reitblatt, and N. Foster. Machine-verified network controllers. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 483–494, Seattle, WA, June 2013. Google Scholar
Digital Library
- Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, P. Bodik, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Proceedings of the HotOS XIV, Santa Ana Pueblo, NM, May 2013. Google Scholar
Digital Library
- M. Hayden. The Ensemble System. PhD thesis, Cornell University, 1998. Google Scholar
Digital Library
- M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463–492, 1990. Google Scholar
Digital Library
- J. J. Hickey, N. Lynch, and R. van Renesse. Specifications and proofs for Ensemble layers. In Proceedings of the 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 119–133, York, UK, Mar. 2009. Google Scholar
Digital Library
- High Scalability. The updated big list of articles on the Amazon outage, May 2011. http://highscalability.com/blog/2011/5/2/theupdated-big-list-of-articles-on-the-amazon-outage.html.Google Scholar
- D. Jackson. Software Abstractions: Logic, Language, and Analysis. MIT Press, Feb. 2012. Google Scholar
Digital Library
- C. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: Language support for building distributed systems. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 179–188, San Diego, CA, June 2007. Google Scholar
Digital Library
- C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 243–256, Cambridge, MA, Apr. 2007. Google Scholar
Digital Library
- G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207–220, Big Sky, MT, Oct. 2009. Google Scholar
Digital Library
- L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Professional, July 2002. Google Scholar
Digital Library
- L. Lamport. Thinking for programmers, Apr. 2014. http://channel9.msdn.com/Events/Build/2014/3-642.Google Scholar
- V. Le, M. Afshari, and Z. Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 216–226, Edinburgh, UK, June 2014. Google Scholar
Digital Library
- X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, July 2009. Google Scholar
Digital Library
- X. Liu, C. Kreitz, R. van Renesse, J. Hickey, M. Hayden, K. Birman, and R. L. Constable. Building reliable, high-performance communication systems from components. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP), pages 80–92, Kiawah Island, SC, Dec. 1999. Google Scholar
Digital Library
- X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D 3 S: Debugging deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 423–437, San Francisco, CA, Apr. 2008. Google Scholar
Digital Library
- N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996. ISBN 1558603484. Google Scholar
Digital Library
- A. Nanevski, G. Morrisett, A. Shinnar, P. Govereau, and L. Birkedal. Ynot: Dependent types for imperative programs. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP), Victoria, British Columbia, Canada, Sept. 2008. Google Scholar
Digital Library
- C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff. Use of formal methods at Amazon Web Services, Sept. 2014. http://research.microsoft.com/enus/um/people/lamport/tla/formal-methods-amazon.pdf.Google Scholar
- NYTimes. Amazon’s trouble raises cloud computing doubts, Apr. 2011. http://www.nytimes.com/2011/04/23/technology/23cloud.html.Google Scholar
- D. Ongaro. Consensus: Bridging Theory and Practice. PhD thesis, Stanford University, Aug. 2014.Google Scholar
Digital Library
- D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, pages 305–319, Philadelphia, PA, June 2014. Google Scholar
Digital Library
- J. L. Peterson. Petri nets. ACM Computing Surveys, pages 223–252, Sept. 1977. Google Scholar
Digital Library
- V. Rahli. Interfacing with proof assistants for domain specific programming using EventML. In Proceedings of the 10th International Workshop On User Interfaces for Theorem Provers, Bremen, Germany, July 2012.Google Scholar
- D. Ricketts, V. Robert, D. Jang, Z. Tatlock, and S. Lerner. Automating formal proofs for reactive systems. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 452–462, Edinburgh, UK, June 2014. Google Scholar
Digital Library
- T. Ridge. Verifying distributed systems: The operational approach. In Proceedings of the 36th ACM Symposium on Principles of Programming Languages (POPL), pages 429–440, Savannah, GA, Jan. 2009. Google Scholar
Digital Library
- D. Sangiorgi and D. Walker. PI-Calculus: A Theory of Mobile Processes. Cambridge University Press, New York, NY, USA, 2001. ISBN 0521781779. Google Scholar
Digital Library
- N. Schiper, V. Rahli, R. van Renesse, M. Bickford, and R. L. Constable. Developing correctly replicated databases using formal tools. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 395–406, Atlanta, GA, June 2014. Google Scholar
Digital Library
- Verdi. Verdi, 2014. https://github.com/uwplse/verdi.Google Scholar
- M. Yabandeh, N. Kneževi´c, D. Kosti´c, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 229–244, San Francisco, CA, Apr. 2008. Google Scholar
Digital Library
- J. Yang and C. Hawblitzel. Safe to the last instruction: Automated verification of a type-safe operating system. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 99–110, Toronto, Canada, June 2010. Google Scholar
Digital Library
- J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. M O D IST : Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI), pages 213–228, Boston, MA, Apr. 2009. Google Scholar
Digital Library
- X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283–294, San Jose, CA, June 2011. Google Scholar
Digital Library
- D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 249–265, Broomfield, CO, Oct. 2014. Google Scholar
Digital Library
- P. Zave. Using lightweight modeling to understand Chord. ACM SIGCOMM Computer Communication Review, 42(2):49–57, Apr. 2012. Google Scholar
Digital Library
Index Terms
Verdi: a framework for implementing and formally verifying distributed systems
Recommendations
Verdi: a framework for implementing and formally verifying distributed systems
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationDistributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex ...
Planning for change in a formal verification of the raft consensus protocol
CPP 2016: Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and ProofsWe present the first formal verification of state machine safety for the Raft consensus protocol, a critical component of many distributed systems. We connected our proof to previous work to establish an end-to-end guarantee that our implementation ...
A unified Coq framework for verifying C programs with floating-point computations
CPP 2016: Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and ProofsWe provide concrete evidence that floating-point computations in C programs can be verified in a homogeneous verification setting based on Coq only, by evaluating the practicality of the combination of the formal semantics of CompCert Clight and the ...






Comments