Abstract
Reliable storage systems depend in part on "write-before" relationships where some changes to stable storage are delayed until other changes commit. A journaled file system, for example, must commit a journal transaction before applying that transaction's changes, and soft updates and other consistency enforcement mechanisms have similar constraints, implemented in each case in system-dependent ways. We present a general abstraction, the patch, that makes write-before relationships explicit and file system agnostic. A patch-based file system implementation expresses dependencies among writes, leaving lower system layers to determine write orders that satisfy those dependencies. Storage system modules can examine and modify the dependency structure, and generalized file system dependencies are naturally exportable to user level. Our patch-based storage system, Feather stitch, includes several important optimizations that reduce patch overheads by orders of magnitude. Our ext2 prototype runs in the Linux kernel and supports a synchronous writes, soft updates-like dependencies, and journaling. It outperforms similarly reliable ext2 and ext3 configurations on some, but not all, benchmarks. It also supports unusual configurations, such as correct dependency enforcement within a loopback file system, and lets applications define consistency requirements without micromanaging how those requirements are satisfied.
Supplemental Material
Available for Download
Supplemental material for Generalized file system dependencies
- Dovecot. Version 1.0 beta7, http://www.dovecot.org/.Google Scholar
- Subversion. http://subversion.tigris.org/.Google Scholar
- UW IMAP toolkit. http://www.washington.edu/imap/.Google Scholar
- Burnett, N. C. Information and Control in File System Buffer Management. PhD thesis, University of Wisconsin--Madison, July 2006. Google Scholar
Digital Library
- Cornell, B., P.A. Dinda, and F.E. Bustamante. Wayback: A user-level versioning file system for Linux. In Proc. 2004 USENIX Annual Technical Conference, FREENIX Track, pages 19--28, June 2004. Google Scholar
Digital Library
- Crispin, M. Internet Message Access Protocol-version 4rev1. RFC 3501, IETF, Mar. 2003. Google Scholar
Digital Library
- Denehy, T.E., A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Journal-guided resynchronization for software RAID. In Proc. 4th USENIX Conference on File and Storage Technologies (FAST '05), pages 87--100, Dec. 2005. Google Scholar
Digital Library
- Gal, E. and S. Toledo. A transactional Flash file system for microcontrollers. In Proc. 2005 USENIX Annual Technical Conference, pages 89--104, Apr. 2005. Google Scholar
Digital Library
- Ganger, G. R., M. K. McKusick, C. A. N. Soules, and Y. N. Patt. Soft updates: A solution to the metadata update problem in file systems. ACM Transactions on Computer Systems, 18 (2): 127--153, May 2000. Google Scholar
Digital Library
- Heidemann, J.S. and G. J. Popek. File-system development with stackable layers. ACM Transactions on Computer Systems, 12 (1): 58--89, Feb. 1994. Google Scholar
Digital Library
- Hitz, D., J. Lau, and M. Malcolm. File system design for an NFS file server appliance. In Proc. USENIX Winter 1994 Technical Conference, pages 235--246, Jan. 1994. Google Scholar
Digital Library
- Huang, H., W. Hung, and K. G. Shin. FS2: Dynamic data replication in free disk space for improving disk performance and energy consumption. In Proc. 20th ACM Symposium on Operating Systems Principles, pages 263--276, Oct. 2005. Google Scholar
Digital Library
- Kaashoek, M. F., D. R. Engler, G. R. Ganger, H. M. Briceño, R. Hunt, D. Mazières, T. Pinckney, R. Grimm, J. Jannotti, and K. Mackenzie. Application performance and flexibility on Exokernel systems. In Proc. 16th ACM Symposium on Operating Systems Principles, pages 52--65, Oct. 1997. Google Scholar
Digital Library
- Katcher, J. PostMark: A new file system benchmark. Technical Report TR0322, Network Appliance, 1997. http://tinyurl.com/27ommd.Google Scholar
- Kleiman, SR. Vnodes: An architecture for multiple file system types in Sun UNIX. In Proc. USENIX Summer 1986 Technical Conference, pages 238--247, 1986.Google Scholar
- Liskov, B. and R. Rodrigues. Transactional file systems can be fast. In Proc. 11th ACM SIGOPS European Workshop, Sept. 2004. Google Scholar
Digital Library
- Mann, T., A. Birrell, A. Hisgen, C. Jerian, and G. Swart. A coherent distributed file cache with directory write-behind. ACM Transactions on Computer Systems, 12(2): 123--164, May 1994. Google Scholar
Digital Library
- McKusick, M. K. and G. R. Ganger. Soft updates: A technique for eliminating most synchronous writes in the Fast Filesystem. In Proc. 1999 USENIX Annual Technical Conference, FREENIX Track, pages 1--17, June 1999. Google Scholar
Digital Library
- McKusick, M. K., W. N. Joy, S. J. Leffler, and R. S. Fabry. A fast file system for UNIX. ACM Transactions on Computer Systems, 2 (3): 181--197, Aug. 1984. Google Scholar
Digital Library
- Muniswamy-Reddy, K.-K., C. P. Wright, A. Himmer, and E. Zadok. A versatile and user-oriented versioning file system. In Proc. 3rd USENIX Conference on File and Storage Technologies (FAST '04), pages 115--128, Mar. 2004. Google Scholar
Digital Library
- Nightingale, E.B., P.M. Chen, and J.Flinn. Speculative execution in a distributed file system. In Proc. 20th ACM Symposium on Operating Systems Principles, pages 191--205, Oct. 2005. Google Scholar
Digital Library
- Nightingale, E.B., K. Veeraraghavan, P.M. Chen, and J. Flinn. Rethink the sync. In Proc. 7th Symposium on Operating Systems Design and Implementation (OSDI '06), pages 1--14, Nov. 2006. Google Scholar
Digital Library
- Quinlan, S. and S. Dorward. Venti: a new approach to archival storage. In Proc. 1st USENIX Conference on File and Storage Technologies (FAST '02), pages 89--101, Jan. 2003. Google Scholar
Digital Library
- Rosenthal, D.S.H. Evolving the Vnode interface. In Proc. USENIX Summer 1990 Technical Conference, pages 107--118, Jan. 1990.Google Scholar
- Rowe, M. Re: wc atomic rename safety on non-ext3 file systems. Subversion developer mailing list, Mar5 2007. http://svn.haxx.se/dev/archive-2007-03/0064.shtml (retrieved August 2007).Google Scholar
- Seltzer, MI., GR. Ganger, MK. McKusick, KA. Smith, CAN. Soules, and CA. Stein. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proc. 2000 USENIX Annual Technical Conference, pages 71--84, June 2000. Google Scholar
Digital Library
- Sivathanu, G., S. Sundararaman, and E. Zadok. Type-safe disks. In Proc. 7th Symposium on Operating Systems Design and Implementation (OSDI '06), pages 15--28, Nov. 2006. Google Scholar
Digital Library
- Sivathanu, M., V. Prabhakaran, F. Popovici, T. Denehy, AC. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Semantically-smart disk systems. In Proc. 2nd USENIX Conference on File and Storage Technologies (FAST '03), Mar. 2003. Google Scholar
Digital Library
- Sivathanu, M., A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Jha. A logic of file systems. In Proc. 4th USENIX Conference on File and Storage Technologies (FAST '05), pages 1--15, Dec. 2005\natexlaba. Google Scholar
Digital Library
- Sivathanu, M., L. N. Bairavasundaram, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Database-aware semantically-smart storage. In Proc. 4th USENIX Conference on File and Storage Technologies (FAST '05), pages 239--252, Dec. 2005. Google Scholar
Digital Library
- Skinner, G. C. and T. K. Wong. "Stacking" Vnodes: A progress report. In Proc. USENIX Summer 1993 Technical Conference, pages 161--174, June 1993. Google Scholar
Digital Library
- Soules, C. A. N., G. R. Goodson, J. D. Strunk, and G. R. Ganger. Metadata efficiency in versioning file systems. In Proc. 2nd USENIX Conference on File and Storage Technologies (FAST '03), pages 43--58, Mar. 2003. Google Scholar
Digital Library
- Ts'o, T. Re: {evals} ext3 vs reiser with quotas, Dec19 2004. http://linuxmafia.com/faq/Filesystems/reiserfs.html (retrieved August 2007).Google Scholar
- Tweedie, S. Journaling the Linux ext2fs filesystem. In Proc. 4th Annual LinuxExpo, 1998.Google Scholar
- Vilayannur, M., PNath, and ASivasubramaniam. Providing tunable consistency for a parallel file store. In Proc. 4th USENIX Conference on File and Storage Technologies (FAST '05), pages 17--30, Dec. 2005. Google Scholar
Digital Library
- Waychison, M. Re: fallocate support for bitmap-based files. linux-ext4 mailing list, June~29 2007. http://www.mail-archive.com/l[email protected]/msg02382.html (retrieved August 2007).Google Scholar
- Wright, C. P. Extending ACID Semantics to the File System via ptrace. PhD thesis, Stony Brook University, May 2006. Google Scholar
Digital Library
- Wright, C. P., M. C. Martino, and E. Zadok. NCryptfs: A secure and convenient cryptographic file system. In Proc. 2003 USENIX Annual Technical Conference, pages 197--210, June 2003.Google Scholar
- Wright, C. P., J. Dave, P. Gupta, H. Krishnan, D. P. Quigley, E. Zadok, and M. N. Zubair. Versatility and Unix semantics in namespace unification. ACM Transactions on Storage, Mar. 2006. Google Scholar
Digital Library
- Yang, J., P. Twohey, D. Engler, and M. Musuvathni. Using model checking to find serious file system errors. In Proc. 6th Symposium on Operating Systems Design and Implementation (OSDI '04), pages 273--288, Dec. 2004. Google Scholar
Digital Library
- Yang, J., C. Sar, and D. Engler. eXplode: a lightweight, general system for finding serious storage system errors. In Proc. 7th Symposium on Operating Systems Design and Implementation (OSDI '06), pages 131--146, Nov. 2006. Google Scholar
Digital Library
- Zadok, E. and J. Nieh. FiST: A language for stackable file systems. In Proc. 2000 USENIX Annual Technical Conference, pages 55--70, June 2000. Google Scholar
Digital Library
- Zadok, E., I. Badulescu, and A. Shender. Extending File Systems Using Stackable Templates. In Proc. 1999 USENIX Annual Technical Conference, pages 57--70, June 1999. Google Scholar
Digital Library
Index Terms
Generalized file system dependencies
Recommendations
Generalized file system dependencies
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principlesReliable storage systems depend in part on "write-before" relationships where some changes to stable storage are delayed until other changes commit. A journaled file system, for example, must commit a journal transaction before applying that transaction'...
WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication
Journaling is a commonly used technique to ensure data consistency in file systems, such as ext3 and ext4. With journaling technique, file system updates are first recorded in a journal (in the commit phase) and later applied to their home locations in ...
A multiple-file write scheme for improving write performance of small files in Fast File System
Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...







Comments