Abstract
Software applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on persistent storage. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program that it satisfies a property, the proof is foiled by these external events that crash and restart the program. In this paper we first provide a hierarchical formal model of what it means for a program to be crash recoverable. Our model captures the recoverability of many real world programs, including those in our evaluation which use sophisticated recovery algorithms such as shadow paging and write-ahead logging. Next, we introduce a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into an encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that recovery terminates and every recovered execution simulates some crash-free execution. Our encoding is designed to enable one to apply existing abstraction techniques in order to do the work that is necessary to prove recoverability. We have implemented our technique in a tool called Eleven82, capable of analyzing C programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper. Within minutes, our tool is able to discover bugs or prove that these fragments are crash recoverable.
- R. Alur, T. A. Henzinger, F. Y. C. Mang, S. Qadeer, S. K. Rajamani, and S. Tasiran. MOCHA: modularity in model checking. In Computer Aided Verification, 10th International Conference, CAV ’98, Proceedings, pages 521–525, 1998. Google Scholar
Digital Library
- L. N. Bairavasundaram. PhD thesis, Characteristics, Impact, and Tolerance of Partial Disk Failures, 2008. Google Scholar
Digital Library
- G. Barthe, P. R. D’Argenio, and T. Rezk. Secure information flow by self-composition. Mathematical Structures in Computer Science, 21(6):1207–1252, 2011. Google Scholar
Digital Library
- T. Beyene, C. Popeea, and A. Rybalchenko. Solving existentially quantified horn clauses. In CAV’11, 2013. Google Scholar
Digital Library
- D. Beyer and M. E. Keremoglu. Cpachecker: A tool for configurable software verification. In G. Gopalakrishnan and S. Qadeer, editors, Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, volume 6806, pages 184–190. Springer, 2011. Google Scholar
Digital Library
- M. Castro, B. Liskov, et al. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999. Google Scholar
Digital Library
- H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, pages 18–37. ACM, 2015. Google Scholar
Digital Library
- H. Chen, D. Ziegler, A. Chlipala, M. F. Kaashoek, E. Kohler, and N. Zeldovich. Specifying crash safety for storage systems. In 15th Workshop on Hot Topics in Operating Systems, 2015. Google Scholar
Digital Library
- J. Christ, J. Hoenicke, and A. Nutz. Smtinterpol: An interpolating SMT solver. In A. F. Donaldson and D. Parker, editors, Model Checking Software - 19th International Workshop, SPIN 2012, Oxford, UK, July 23-24, 2012. Proceedings, volume 7385 of Lecture Notes in Computer Science, pages 248–254. Springer, 2012. Google Scholar
Digital Library
- A. Cimatti, E. M. Clarke, F. Giunchiglia, and M. Roveri. NUSMV: A new symbolic model checker. STTT, 2(4):410–425, 2000.Google Scholar
Cross Ref
- A. Cimatti, A. Griggio, B. Schaafsma, and R. Sebastiani. The Math-SAT5 SMT Solver. In N. Piterman and S. Smolka, editors, Proceedings of TACAS, volume 7795 of LNCS. Springer, 2013. Google Scholar
Digital Library
- B. Cook and E. Koskinen. Making prophecies with decision predicates. In T. Ball and M. Sagiv, editors, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 399–410. ACM, 2011. Google Scholar
Digital Library
- B. Cook and E. Koskinen. Reasoning about nondeterminism in programs. In PLDI’13. ACM, 2013. Google Scholar
Digital Library
- B. Cook, E. Koskinen, and M. Y. Vardi. Temporal property verification as a program analysis task. In CAV’11, pages 333–348, 2011. Google Scholar
Digital Library
- B. Cook, A. Podelski, and A. Rybalchenko. Termination proofs for systems code. In PLDI’06, pages 415–426, 2006. Google Scholar
Digital Library
- H. Cui, G. Hu, J. Wu, and J. Yang. Verifying systems rules using ruledirected symbolic execution. In Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS ’13), 2013. Google Scholar
Digital Library
- D. Dietsch, M. Heizmann, V. Langenfeld, and A. Podelski. Fairness modulo theory: A new approach to LTL software model checking. In CAV, 2015.Google Scholar
- G. R. Ganger and Y. N. Patt. Soft updates: A solution to the metadata update problem in file systems. Technical report, University of Michigan, 1995.Google Scholar
- P. Gardner, G. Ntzik, and A. Wright. Local reasoning for the POSIX file system. In Z. Shao, editor, Proceedings of the 23rd European Symposium on Programming, ESOP 2014, volume 8410 of Lecture Notes in Computer Science, pages 169–188. Springer, 2014.Google Scholar
- J. Gray, P. McJones, M. Blasgen, B. Lindsay, R. Lorie, T. Price, F. Putzolu, and I. Traiger. The recovery manager of the system r database manager. ACM Comput. Surv., 13(2):223–242, June 1981. Google Scholar
Digital Library
- R. Gu, J. Koenig, T. Ramananandro, Z. Shao, X. N. Wu, S.-C. Weng, H. Zhang, and Y. Guo. Deep specifications and certified abstraction layers. In PRoceedings of the42nd ACM Symposium on Principles of Programming Languages (POPL’15), 2015. Google Scholar
Digital Library
- R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In S. Chatterjee and M. L. Scott, editors, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2008, Salt Lake City, UT, USA, February 20-23, 2008, pages 175–184. ACM, 2008. Google Scholar
Digital Library
- H. S. Gunawi, A. Rajimwale, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Sqck: A declarative file system checker. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 131–146, 2008. Google Scholar
Digital Library
- T. A. Henzinger, R. Jhala, R. Majumdar, G. C. Necula, G. Sutre, and W. Weimer. Temporal-safety proofs for systems code. In E. Brinksma and K. G. Larsen, editors, Proceedings of the 14th International Conference on Computer Aided Verification (CAV’02), volume 2404, pages 526–538. Springer, 2002. Google Scholar
Digital Library
- C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst., 17(1):94–162, Mar. 1992. Google Scholar
Digital Library
- G. Ntzik, P. da Rocha Pinto, and P. Gardner. Fault-tolerant resource reasoning. Proceedings of the 13th Asian Symposium on Programming Languages and Systems (APLAS), Pohang, South Korea, 2015.Google Scholar
Cross Ref
- T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI ’14), Broomfield, CO, October 2014. Google Scholar
Digital Library
- H. E. Ramadan, I. Roy, M. Herlihy, and E. Witchel. Committing conflicting transactions in an STM. In D. A. Reed and V. Sarkar, editors, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February 14-18, 2009, pages 163–172. ACM, 2009. Google Scholar
Digital Library
- Y. Raz. The principle of commitment ordering, or guaranteeing serializability in a heterogeneous environment of multiple autonomous resource mangers using atomic commitment. In Proceedings of the 18th International Conference on Very Large Data Bases, pages 292– 312. Morgan Kaufmann Publishers Inc., 1992. Google Scholar
Digital Library
- T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Madhavapeddy, and P. Sewell. Sibylfs: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th Symp. on Operating Systems Principles, SOSP 2015, pg. 38–53, 2015. Google Scholar
Digital Library
- T. Terauchi and A. Aiken. Secure information flow as a safety problem. In Static Analysis, 12th International Symposium, SAS 2005, London, UK, September 7-9, 2005, Proceedings, pages 352–367, 2005. Google Scholar
Digital Library
- S. C. Tweedie. Journaling the Linux ext2fs File System. In the Fourth Annual Linux Expo, May 1998.Google Scholar
- J. Yang, C. Sar, and D. Engler. Explode: a lightweight, general system for finding serious storage system errors. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI ’06), pages 131–146, Nov. 2006. Google Scholar
Digital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 273–288, Dec. 2004. Google Scholar
Digital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. ACM Transactions on Computer Systems, 24(4):393–423, Nov. 2006. Google Scholar
Digital Library
- J. Yang, P. Twohey, B. Pfaff, C. Sar, and D. Engler. eXplode: A lightweight, general approach for finding serious errors in storage systems. In Proceedings of the first Workshop on the Evaluation of Software Defect Detection Tools (BUGS ’05), June 2005.Google Scholar
Index Terms
Reducing crash recoverability to reachability
Recommendations
Reducing crash recoverability to reachability
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesSoftware applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they ...
On the Quality of Service of Crash-Recovery Failure Detectors
We model the probabilistic behavior of a system comprising a failure detector and a monitored crash-recovery target. We extend failure detectors to take account of failure recovery in the target system. This involves extending QoS measures to include ...
Conditional model checking: a technique to pass information between verifiers
FSE '12: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software EngineeringSoftware model checking, as an undecidable problem, has three possible outcomes: (1) the program satisfies the specification, (2) the program does not satisfy the specification, and (3) the model checker fails. The third outcome usually manifests itself ...






Comments