skip to main content
article
Public Access

Reducing crash recoverability to reachability

Published:11 January 2016Publication History
Skip Abstract Section

Abstract

Software applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on persistent storage. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program that it satisfies a property, the proof is foiled by these external events that crash and restart the program. In this paper we first provide a hierarchical formal model of what it means for a program to be crash recoverable. Our model captures the recoverability of many real world programs, including those in our evaluation which use sophisticated recovery algorithms such as shadow paging and write-ahead logging. Next, we introduce a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into an encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that recovery terminates and every recovered execution simulates some crash-free execution. Our encoding is designed to enable one to apply existing abstraction techniques in order to do the work that is necessary to prove recoverability. We have implemented our technique in a tool called Eleven82, capable of analyzing C programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper. Within minutes, our tool is able to discover bugs or prove that these fragments are crash recoverable.

References

  1. R. Alur, T. A. Henzinger, F. Y. C. Mang, S. Qadeer, S. K. Rajamani, and S. Tasiran. MOCHA: modularity in model checking. In Computer Aided Verification, 10th International Conference, CAV ’98, Proceedings, pages 521–525, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. N. Bairavasundaram. PhD thesis, Characteristics, Impact, and Tolerance of Partial Disk Failures, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Barthe, P. R. D’Argenio, and T. Rezk. Secure information flow by self-composition. Mathematical Structures in Computer Science, 21(6):1207–1252, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Beyene, C. Popeea, and A. Rybalchenko. Solving existentially quantified horn clauses. In CAV’11, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Beyer and M. E. Keremoglu. Cpachecker: A tool for configurable software verification. In G. Gopalakrishnan and S. Qadeer, editors, Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, volume 6806, pages 184–190. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Castro, B. Liskov, et al. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, pages 18–37. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Chen, D. Ziegler, A. Chlipala, M. F. Kaashoek, E. Kohler, and N. Zeldovich. Specifying crash safety for storage systems. In 15th Workshop on Hot Topics in Operating Systems, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Christ, J. Hoenicke, and A. Nutz. Smtinterpol: An interpolating SMT solver. In A. F. Donaldson and D. Parker, editors, Model Checking Software - 19th International Workshop, SPIN 2012, Oxford, UK, July 23-24, 2012. Proceedings, volume 7385 of Lecture Notes in Computer Science, pages 248–254. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Cimatti, E. M. Clarke, F. Giunchiglia, and M. Roveri. NUSMV: A new symbolic model checker. STTT, 2(4):410–425, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Cimatti, A. Griggio, B. Schaafsma, and R. Sebastiani. The Math-SAT5 SMT Solver. In N. Piterman and S. Smolka, editors, Proceedings of TACAS, volume 7795 of LNCS. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Cook and E. Koskinen. Making prophecies with decision predicates. In T. Ball and M. Sagiv, editors, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 399–410. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Cook and E. Koskinen. Reasoning about nondeterminism in programs. In PLDI’13. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Cook, E. Koskinen, and M. Y. Vardi. Temporal property verification as a program analysis task. In CAV’11, pages 333–348, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Cook, A. Podelski, and A. Rybalchenko. Termination proofs for systems code. In PLDI’06, pages 415–426, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Cui, G. Hu, J. Wu, and J. Yang. Verifying systems rules using ruledirected symbolic execution. In Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS ’13), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Dietsch, M. Heizmann, V. Langenfeld, and A. Podelski. Fairness modulo theory: A new approach to LTL software model checking. In CAV, 2015.Google ScholarGoogle Scholar
  18. G. R. Ganger and Y. N. Patt. Soft updates: A solution to the metadata update problem in file systems. Technical report, University of Michigan, 1995.Google ScholarGoogle Scholar
  19. P. Gardner, G. Ntzik, and A. Wright. Local reasoning for the POSIX file system. In Z. Shao, editor, Proceedings of the 23rd European Symposium on Programming, ESOP 2014, volume 8410 of Lecture Notes in Computer Science, pages 169–188. Springer, 2014.Google ScholarGoogle Scholar
  20. J. Gray, P. McJones, M. Blasgen, B. Lindsay, R. Lorie, T. Price, F. Putzolu, and I. Traiger. The recovery manager of the system r database manager. ACM Comput. Surv., 13(2):223–242, June 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Gu, J. Koenig, T. Ramananandro, Z. Shao, X. N. Wu, S.-C. Weng, H. Zhang, and Y. Guo. Deep specifications and certified abstraction layers. In PRoceedings of the42nd ACM Symposium on Principles of Programming Languages (POPL’15), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In S. Chatterjee and M. L. Scott, editors, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2008, Salt Lake City, UT, USA, February 20-23, 2008, pages 175–184. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. S. Gunawi, A. Rajimwale, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Sqck: A declarative file system checker. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 131–146, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. A. Henzinger, R. Jhala, R. Majumdar, G. C. Necula, G. Sutre, and W. Weimer. Temporal-safety proofs for systems code. In E. Brinksma and K. G. Larsen, editors, Proceedings of the 14th International Conference on Computer Aided Verification (CAV’02), volume 2404, pages 526–538. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst., 17(1):94–162, Mar. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Ntzik, P. da Rocha Pinto, and P. Gardner. Fault-tolerant resource reasoning. Proceedings of the 13th Asian Symposium on Programming Languages and Systems (APLAS), Pohang, South Korea, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  27. T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI ’14), Broomfield, CO, October 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. E. Ramadan, I. Roy, M. Herlihy, and E. Witchel. Committing conflicting transactions in an STM. In D. A. Reed and V. Sarkar, editors, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February 14-18, 2009, pages 163–172. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Raz. The principle of commitment ordering, or guaranteeing serializability in a heterogeneous environment of multiple autonomous resource mangers using atomic commitment. In Proceedings of the 18th International Conference on Very Large Data Bases, pages 292– 312. Morgan Kaufmann Publishers Inc., 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Madhavapeddy, and P. Sewell. Sibylfs: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th Symp. on Operating Systems Principles, SOSP 2015, pg. 38–53, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Terauchi and A. Aiken. Secure information flow as a safety problem. In Static Analysis, 12th International Symposium, SAS 2005, London, UK, September 7-9, 2005, Proceedings, pages 352–367, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. C. Tweedie. Journaling the Linux ext2fs File System. In the Fourth Annual Linux Expo, May 1998.Google ScholarGoogle Scholar
  33. J. Yang, C. Sar, and D. Engler. Explode: a lightweight, general system for finding serious storage system errors. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI ’06), pages 131–146, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 273–288, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. ACM Transactions on Computer Systems, 24(4):393–423, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Yang, P. Twohey, B. Pfaff, C. Sar, and D. Engler. eXplode: A lightweight, general approach for finding serious errors in storage systems. In Proceedings of the first Workshop on the Evaluation of Software Defect Detection Tools (BUGS ’05), June 2005.Google ScholarGoogle Scholar

Index Terms

  1. Reducing crash recoverability to reachability

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM SIGPLAN Notices
                      ACM SIGPLAN Notices  Volume 51, Issue 1
                      POPL '16
                      January 2016
                      815 pages
                      ISSN:0362-1340
                      EISSN:1558-1160
                      DOI:10.1145/2914770
                      • Editor:
                      • Andy Gill
                      Issue’s Table of Contents
                      • cover image ACM Conferences
                        POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
                        January 2016
                        815 pages
                        ISBN:9781450335492
                        DOI:10.1145/2837614

                      Copyright © 2016 ACM

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 11 January 2016

                      Check for updates

                      Qualifiers

                      • article

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader
                    About Cookies On This Site

                    We use cookies to ensure that we give you the best experience on our website.

                    Learn more

                    Got it!