Abstract
Applications depend on persistent storage to recover state after system crashes. But the POSIX file system interfaces do not define the possible outcomes of a crash. As a result, it is difficult for application writers to correctly understand the ordering of and dependencies between file system operations, which can lead to corrupt application state and, in the worst case, catastrophic data loss. This paper presents crash-consistency models, analogous to memory consistency models, which describe the behavior of a file system across crashes. Crash-consistency models include both litmus tests, which demonstrate allowed and forbidden behaviors, and axiomatic and operational specifications. We present a formal framework for developing crash-consistency models, and a toolkit, called Ferrite, for validating those models against real file system implementations. We develop a crash-consistency model for ext4, and use Ferrite to demonstrate unintuitive crash behaviors of the ext4 implementation. To demonstrate the utility of crash-consistency models to application writers, we use our models to prototype proof-of-concept verification and synthesis tools, as well as new library interfaces for crash-safe applications.
- S. V. Adve and H.-J. Boehm. Memory models: A case for rethinking parallel languages and hardware. Communications of the ACM, 53 (8): 90--101, Aug. 2010.Google Scholar
Digital Library
- J. Alglave. A formal hierarchy of weak memory models. Formal Methods in System Design, 41 (2): 178--210, Oct. 2012.Google Scholar
Digital Library
- J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. In Proceedings of the 22nd International Conference on Computer Aided Verification (CAV), pages 258--272, Edinburgh, UK, July 2010.Google Scholar
Digital Library
- J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Litmus: Running tests against hardware. In Proceedings of the 17th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 41--44, Saarbrücken, Germany, Mar.--Apr. 2011.Google Scholar
Cross Ref
- R. H. Arpaci-Dusseau and A. C. Arpaci-Dusseau. Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books, 0.90 edition, Mar. 2015.Google Scholar
- Austin Group. 0000672: Necessary step(s) to synchronize filename operations on disk, 2013. http://austingroupbugs.net/view.php?id=672.Google Scholar
- F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 41--46, Anaheim, CA, Apr. 2005.Google Scholar
Digital Library
- W. R. Bevier and R. M. Cohen. An executable model of the synergy file system. Technical Report 121, Computational Logic, Inc., Oct. 1996.Google Scholar
- W. R. Bevier, R. M. Cohen, and J. Turner. A specification for the synergy file system. Technical Report 120, Computational Logic, Inc., Sept. 1995.Google Scholar
- N. Boichat. Issue 502898: ext4: Filesystem corruption on panic, June 2015. https://code.google.com/p/chromium/issues/detail?id=502898.Google Scholar
- J. Bonwick. ZFS: The last word in filesystems, Oct. 2005. https://blogs.oracle.com/bonwick/entry/zfs_the_last_word_in.Google Scholar
- G. Boudol and G. Petri. Relaxed memory models: An operational approach. In Proceedings of the 36th ACM Symposium on Principles of Programming Languages (POPL), pages 392--403, Savannah, GA, Jan. 2009.Google Scholar
Digital Library
- Btrfs. What are the crash guarantees of overwrite-by-rename? https://btrfs.wiki.kernel.org/index.php/FAQ.Google Scholar
- H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using Crash Hoare Logic for certifying the FSCQ file system. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.Google Scholar
Digital Library
- V. Chidambaram, T. Sharma, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Consistency without ordering. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST), pages 101--116, San Jose, CA, Feb. 2012.Google Scholar
Digital Library
- V. Chidambaram, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pages 228--243, Farmington, PA, Nov. 2013.Google Scholar
Digital Library
- H. Chu. MDB: A memory-mapped database and backend for OpenLDAP. In Proceedings of the 3rd International Conference on LDAP, Heidelberg, Germany, Oct. 2011.Google Scholar
- A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: Designing scalable software for multicore processors. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pages 1--17, Farmington, PA, Nov. 2013.Google Scholar
Digital Library
- J. Corbet. ext4 and data loss, Mar. 2009. http://lwn.net/Articles/322823/.Google Scholar
- J. Corbet. That massive filesystem thread, Mar. 2009. https://lwn.net/Articles/326471/.Google Scholar
- L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337--340, Budapest, Hungary, Mar.--Apr. 2008.Google Scholar
Digital Library
- D. R. Engler, M. F. Kaashoek, and J. W. O'Toole. Exokernel: An operating system architecture for application-level resource management. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP), pages 251--266, Copper Mountain, CO, Dec. 1995.Google Scholar
Digital Library
- M. Flatt and PLT. Reference: Racket. Technical Report PLT-TR-2010--1, PLT Design Inc., 2010. http://racket-lang.org/.Google Scholar
- C. Frost, M. Mammarella, E. Kohler, A. de los Reyes, S. Hovsepian, A. Matsuoka, and L. Zhang. Generalized file system dependencies. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP), pages 307--320, Stevenson, WA, Oct. 2007.Google Scholar
- G. R. Ganger and Y. N. Patt. Metadata update performance in file systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI), pages 49--60, Monterey, CA, Nov. 1994.Google Scholar
Digital Library
- D. Giampaolo. Practical File System Design with the BE File System. Morgan Kaufmann Publishers, 1999.Google Scholar
- J. Gray. Notes on data base operating systems. In Operating Systems, An Advanced Course, pages 393--481. Springer-Verlag, 1977.Google Scholar
- R. Hagmann. Reimplementing the cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP), pages 155--162, Austin, TX, Nov. 1987.Google Scholar
Digital Library
- D. Hitz, J. Lau, and M. Malcolm. File system design for an NFS file server appliance. In Proceedings of the Winter 1994 USENIX Technical Conference, San Francisco, CA, Jan. 1994.Google Scholar
Digital Library
- IEEE and The Open Group. The open group base specifications issue 7, 2013.Google Scholar
- Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, 2015. rev. 57.Google Scholar
- W. K. Josephson, L. A. Bongo, D. Flynn, and K. Li. DFS: A file system for virtualized flash storage. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), pages 1--15, San Jose, CA, Feb. 2010.Google Scholar
Digital Library
- R. Joshi and G. J. Holzmann. A mini challenge: Build a verifiable filesystem. Formal Aspects of Computing, 19 (2): 269--272, June 2007.Google Scholar
Cross Ref
- M. F. Kaashoek, D. R. Engler, G. R. Ganger, H. M. Briceno, R. Hunt, D. Mazières, T. Pinckney, R. Grimm, J. Jannotti, and K. Mackenzie. Application performance and flexibility on exokernel systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP), pages 52--65, Saint-Malo, France, Oct. 1997.Google Scholar
Digital Library
- E. Kang and D. Jackson. Formal modeling and analysis of a Flash filesystem in Alloy. In Proceedings of the 1st Int'l Conference of Abstract State Machines, B and Z, pages 294--308, London, UK, Sept. 2008.Google Scholar
Digital Library
- G. Keller, T. Murray, S. Amani, L. O'Connor, Z. Chen, L. Ryzhyk, G. Klein, and G. Heiser. File systems deserve verification too. In Proceedings of the 7th Workshop on Programming Languages and Operating Systems, Farmington, PA, Nov. 2013.Google Scholar
Digital Library
- M. Kuperstein, M. Vechev, and E. Yahav. Automatic inference of memory fences. In Proceedings of 10th International Conference on Formal Methods in Computer-Aided Design, pages 111--120, Lugano, Switzerland, Oct. 2010.Google Scholar
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 26 (9): 690--691, Sept. 1979.Google Scholar
Digital Library
- C. Lee, D. Sim, J.-Y. Hwang, and S. Cho. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), pages 273--286, Santa Clara, CA, Feb. 2015.Google Scholar
- K. R. M. Leino. Dafny: An automatic program verifier for functional correctness. In Proceedings of the 16th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pages 348--370, Dakar, Senegal, Apr.--May 2010.Google Scholar
Cross Ref
- Linux kernel. Bug 15910 - zero-length files and performance degradation, 2010. https://bugzilla.kernel.org/show_bug.cgi?id=15910.Google Scholar
- Linux kernel. Ext4 filesystem, 2015. https://www.kernel.org/doc/Documentation/filesystems/ext4.txt.Google Scholar
- Linux man-pages.ccclose - close a file descriptor, 2013. http://man7.org/linux/man-pages/man2/close.2.html.Google Scholar
- R. A. Lorie. Physical integrity in a large segmented database. ACM Transactions on Database Systems, 2 (1): 91--104, Mar. 1977.Google Scholar
Digital Library
- R. Lortie. more on dconf performance, btrfs and fsync, Dec. 2010. https://blogs.gnome.org/desrt/2010/12/19/more-on-dconf-performance-btrfs-and-fsync/.Google Scholar
- R. Lortie. ext4 file replace guarantees, June 2013. http://www.spinics.net/lists/linux-ext4/msg38774.html.Google Scholar
- L. Lu, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Lu. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), pages 31--44, San Jose, CA, Feb. 2013.Google Scholar
Digital Library
- S. Mador-Haim, R. Alur, and M. M. K. Martin. Generating litmus tests for contrasting memory consistency models. In Proceedings of the 22nd International Conference on Computer Aided Verification (CAV), pages 273--287, Edinburgh, UK, July 2010.Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In Proceedings of the 32nd ACM Symposium on Principles of Programming Languages (POPL), pages 378--391, Long Beach, CA, Jan. 2005.Google Scholar
Digital Library
- M. K. McKusick. Journaled soft-updates. In BSDCan, Ottawa, Canada, May 2010.Google Scholar
- M. K. McKusick and T. J. Kowalski. Fsck: The UNIX file system check program. UNIX System Manager's Manual (SMM), Oct. 1996.Google Scholar
- Microsoft. Alternatives to using Transactional NTFS, 2015. https://msdn.microsoft.com/en-us/library/windows/desktop/bb968806(v=vs.85).aspx.Google Scholar
- C. Min, W.-H. Kang, T. Kim, S.-W. Lee, and Y. I. Eom. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the 2015 USENIX Annual Technical Conference, pages 221--234, Santa Clara, CA, July 2015.Google Scholar
Digital Library
- C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems, 17 (1): 94--162, Mar. 1992.Google Scholar
Digital Library
- Mozilla. Bug 421482 - Firefox 3 usesccfsync excessively, 2008--2015. https://bugzilla.mozilla.org/show_bug.cgi?id=421482.Google Scholar
- S. Neumann. Re: fsync in glib/gio, Mar. 2009. https://mail.gnome.org/archives/gtk-devel-list/2009-March/msg00098.html.Google Scholar
- E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--14, Seattle, WA, Nov. 2006.Google Scholar
Digital Library
- Open Group. fsync - synchronise changes to a file. The Single UNIX Specification, Version 2, 1997. http://pubs.opengroup.org/onlinepubs/7908799/xsh/fsync.html.Google Scholar
- S. Park, T. Kelly, and K. Shen. Failure-atomic msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the ACM EuroSys Conference, pages 225--238, Prague, Czech Republic, Apr. 2013.Google Scholar
Digital Library
- S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Broomfield, CO, Oct. 2014.Google Scholar
Digital Library
- T. S. Pillai, V. Chidambaram, J.-Y. Hwang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Towards efficient, portable application-level consistency. In Proceedings of the 9th Workshop on Hot Topics in Dependable Systems, Farmington, PA, Nov. 2013.Google Scholar
Digital Library
- T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 433--448, Broomfield, CO, Oct. 2014.Google Scholar
Digital Library
- D. E. Porter, O. S. Hofmann, C. J. Rossbach, A. Benn, and E. Witchel. Operating systems transactions. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 161--176, Big Sky, MT, Oct. 2009.Google Scholar
Digital Library
- V. Prabhakaran, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Model-based failure analysis of journaling file systems. In Proceedings of the 35th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 802--811, Yokohama, Japan, June--July 2005.Google Scholar
Digital Library
- V. Prabhakaran, T. L. Rodeheffer, and L. Zhou. Transactional flash. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), pages 147--160, San Diego, CA, Dec. 2008.Google Scholar
- T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Madhavapeddy, and P. Sewell. SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.Google Scholar
Digital Library
- D. M. Ritchie and K. Thompson. The UNIX time-sharing system. Communications of the ACM, 17 (7): 365--375, July 1974.Google Scholar
Digital Library
- O. Rodeh, J. Bacik, and C. Mason. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage, 9 (3), Aug. 2013.Google Scholar
Digital Library
- M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP), pages 1--15, Pacific Grove, CA, Oct. 1991.Google Scholar
Digital Library
- C. Rubio-González, H. S. Gunawi, B. Liblit, R. H. Arpaci-Dusseau, and A. C. Arpaci-Dusseau. Error propagation analysis for file systems. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 270--280, Dublin, Ireland, June 2009.Google Scholar
Digital Library
- G. Schellhorn, G. Ernst, J. Pf\"ahler, D. Haneberg, and W. Reif. Development of a verified flash file system. In Proceedings of the 4th International ABZ Conference, pages 9--24, Toulouse, France, June 2014.Google Scholar
Digital Library
- K. Shen, S. Park, and M. Zhu. Journaling of journal is (almost) free. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST), pages 287--293, Santa Clara, CA, Feb. 2014.Google Scholar
Digital Library
- A. Solar-Lezama. Program synthesis by sketching. PhD thesis, University of California, Berkeley, 2008.Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodik, V. Saraswat, and S. Seshia. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 404--415, San Jose, CA, Oct. 2006.Google Scholar
Digital Library
- D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool, 2011.Google Scholar
Digital Library
- R. P. Spillane, S. Gaikwad, M. Chinni, E. Zadok, and C. P. Wright. Enabling transactional file access via lightweight kernel extensions. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), pages 29--42, San Francisco, CA, Feb. 2009.Google Scholar
- SQLite. Atomic commit in SQLite, 2013. https://www.sqlite.org/atomiccommit.html.Google Scholar
- A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, and G. Peck. Scalability in the XFS file system. In Proceedings of the 1996 USENIX Annual Technical Conference, San Diego, CA, Jan. 1996.Google Scholar
Digital Library
- The Open Group. Technical standard: Extended API set part 2, Oct. 2006.Google Scholar
- E. Torlak and R. Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 530--541, Edinburgh, UK, June 2014.Google Scholar
Digital Library
- E. Torlak and D. Jackson. Kodkod: A relational model finder. In Proceedings of the 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 632--647, Braga, Portugal, Mar.--Apr. 2007.Google Scholar
Cross Ref
- E. Torlak, M. Vaziri, and J. Dolby. MemSAT: Checking axiomatic specifications of memory models. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 341--350, Toronto, Canada, June 2010.Google Scholar
Digital Library
- L. Tung. Bitcoin developers offer $10,000 virtual bounty to fix mystery Mac bug, Nov. 2013. http://goo.gl/Ssbj8T.Google Scholar
- Ubuntu. Bug#317781: Ext4 data loss, Jan. 2009. https://bugs.launchpad.net/ubuntu/source/linux/bug/317781.Google Scholar
- R. Verma, A. A. Mendez, S. Park, S. Mannarswamy, T. Kelly, and C. B. M. III. Failure-atomic updates of application data in a Linux file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), pages 203--211, Santa Clara, CA, Feb. 2015.Google Scholar
Digital Library
- M. Wenzel. Some aspects of Unix file-system security, Aug. 2014. http://isabelle.in.tum.de/library/HOL/HOL-Unix/Unix.html.Google Scholar
- C. P. Wright, R. Spillane, G. Sivathanu, and E. Zadok. Extending ACID semantics to the file system. ACM Transactions on Storage, 3 (2): 1--42, June 2007.Google Scholar
Digital Library
- Yang, Twohey, Engler, and Musuvathi]yang:fiscJ. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), pages 273--287, San Francisco, CA, Dec. 2004.Google Scholar
Digital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. eXplode: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), pages 131--146, Seattle, WA, Nov. 2006.Google Scholar
Digital Library
- Yang, Gopalakrishnan, Lindstrom, and Slind]nemosY. Yang, G. Gopalakrishnan, G. Lindstrom, and K. Slind. Nemos: a framework for axiomatic and executable specifications of memory consistency models. In IPDPS, 2004.Google Scholar
Cross Ref
- Y. Yang, G. Gopalakrishnan, and G. Lindstrom. UMM: An operational memory model specification framework with integrated model checking capability. Concurrency and Computation: Practice & Experience, 17: 465--487, Apr. 2005.Google Scholar
Digital Library
- M. Zheng, J. Tucek, D. Huang, F. Qin, M. Lillibridge, E. S. Yang, B. W. Zhao, and S. Singh. Torturing databases for fun and profit. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 449--464, Broomfield, CO, Oct. 2014.Google Scholar
Digital Library
Index Terms
Specifying and Checking File System Crash-Consistency Models
Recommendations
Specifying and Checking File System Crash-Consistency Models
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsApplications depend on persistent storage to recover state after system crashes. But the POSIX file system interfaces do not define the possible outcomes of a crash. As a result, it is difficult for application writers to correctly understand the ...
Specifying and Checking File System Crash-Consistency Models
ASPLOS'16Applications depend on persistent storage to recover state after system crashes. But the POSIX file system interfaces do not define the possible outcomes of a crash. As a result, it is difficult for application writers to correctly understand the ...
A multiple-file write scheme for improving write performance of small files in Fast File System
Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...







Comments