Abstract
We introduce a new reliability infrastructure for file systems called I/O shepherding. I/O shepherding allows a file system developer to craft nuanced reliability policies to detect and recover from a wide range of storage system failures. We incorporate shepherding into the Linux ext3 file system through a set of changes to the consistency management subsystem, layout engine, disk scheduler, and buffer cache. The resulting file system, CrookFS, enables a broad class of policies to be easily and correctly specified. We implement numerous policies, incorporating data protection techniques such as retry, parity, mirrors, checksums, sanity checks, and data structure repairs; even complex policies can be implemented in less than 100 lines of code, confirming the power and simplicity of the shepherding framework. We also demonstrate that shepherding is properly integrated, adding less than 5% overhead to the I/O path.
Supplemental Material
Available for Download
Supplemental material for Improving file system reliability with I/O shepherding
- David G. Andersen, Deepak Bansal, Dorothy Curtis, Srinivasan Seshan, and Hari Balakrishnan. System Support for Bandwidth Management and Content Adaptation in Internet Applications. In OSDI'00, pages 213--226, San Diego, CA, October 2000. Google Scholar
Digital Library
- Dave Anderson, Jim Dykes, and Erik Riedel. More Than an Interface: SCSI vs. ATA. In FAST'03, San Francisco, CA, April 2003. Google Scholar
Digital Library
- Lakshmi Bairavasundaram. On the frequency of transient faults in modern disk drives. Personal Communication, 2007.Google Scholar
- Lakshmi Bairavasundaram, Garth Goodson, Shankar Pasupathy, and Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In SIGMETRICS'07, pages 289--300, San Diego, CA, June 2007. Google Scholar
Digital Library
- Hari Balakrishnan, Hariharan S. Rahul, and Srinivasan Seshan. An Integrated Congestion Management Architecture for Internet Hosts. In SIGCOMM'99, pages 175--187, Cambridge, MA, August 1999. Google Scholar
Digital Library
- Wendy Bartlett and Lisa Spainhower. Commercial Fault Tolerance: A Tale of Two Systems. IEEE Transactions on Dependable and Secure Computing, 1(1):87--96, January 2004. Google Scholar
Digital Library
- Aaron B. Brown and David A. Patterson. Undo for Operators: Building an Undoable E-mail Store. In USENIX'03, San Antonio, TX, June 2003. Google Scholar
Digital Library
- Andy Chou, Jun-Feng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. An Empirical Study of Operating System Errors. In SOSP'01, pages 73--88, Banff, Canada, October 2001. Google Scholar
Digital Library
- Yvonne Coady, Gregor Kiczales, Mike Feeley, and Greg Smolyn. Using AspectC to Improve the Modularity of Path-Specific Customization in Operating System Code. In ESEC/FSE--9, September 2001. Google Scholar
Digital Library
- Peter Corbett, Bob English, Atul Goel, Tomislav Grcanac, Steven Kleiman, James Leong, and Sunitha Sankar. Row-Diagonal Parity for Double Disk Failure Correction. In FAST'04, pages 1--14, San Francisco, CA, April 2004. Google Scholar
Digital Library
- Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Bridging the Information Gap in Storage Protocol Stacks. In USENIX'02, pages 177--190, Monterey, CA, June 2002. Google Scholar
Digital Library
- Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In SOSP'01, pages 57--72, Banff, Canada, October 2001. Google Scholar
Digital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In SOSP'03, pages 29--43, Bolton Landing, NY, October 2003. Google Scholar
Digital Library
- Jim Gray. A Census of Tandem System Availability Between 1985 and 1990. Technical Report 90.1, Tandem Computers, 1990.Google Scholar
- Jim Gray and Andreas Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. Google Scholar
Digital Library
- Roedy Green. EIDE Controller Flaws Version 24. http://mindprod.com/jgloss/eideflaw.html, February 2005.Google Scholar
- Robert Hagmann. Reimplementing the Cedar File System Using Logging and Group Commit. In SOSP'87, Austin, TX, November 1987. Google Scholar
Digital Library
- John S. Heidemann and Gerald J. Popek. File-system development with stackable layers. ACM Transactions on Computer Systems, 12(1):58--89, 1994. Google Scholar
Digital Library
- Hai Huang, Wanda Hung, and Kang G. Shin. FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In SOSP'05, pages 263--276, Brighton, UK, October 2005. Google Scholar
Digital Library
- Gordon F. Hughes and Joseph F. Murray. Reliability and Security of RAID Storage Systems and D2D Archives Using SATA Disk Drives. ACM Transactions on Storage, 1(1):95--107, February 2005. Google Scholar
Digital Library
- Hannu H. Kari, H. Saikkonen, and F. Lombardi. Detection of Defective Media in Disks. In The IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems, pages 49--55, Venice, Italy, October 1993. Google Scholar
Digital Library
- Jeffrey Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., October 1997.Google Scholar
- Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Lopes, Jean-Marc Loingtier, and John Irwin. Aspect-Oriented Programming. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP), pages 220--242, 1997.Google Scholar
- Steve R. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun UNIX. In USENIX Summer'86, pages 238--247, Atlanta, GA, June 1986.Google Scholar
- Henry F. Korth, Eliezer Levy, and Abraham Silberschatz. A Formal Approach to Recovery by Compensating Transactions. In VLDB 16, pages 95--106, Brisbane, Australia, August 1990. Google Scholar
Digital Library
- Larry Lancaster and Alan Rowe. Measuring Real World Data Availability. In Proceedings of the LISA 2001 15th Systems Administration Conference, pages 93--100, San Diego, California, December 2001. Google Scholar
Digital Library
- Robert Morris, Eddie Kohler, John Jannotti, and M. Frans Kaashoek. The Click Modular Router. In SOSP'99, pages 217--231, Kiawah Island Resort, SC, December 1999. Google Scholar
Digital Library
- Kiran Nagaraja, Fabio Olivera, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen. Understanding and Dealing with Operator Mistakes in Internet Services. In OSDI'04, San Francisco, CA, December 2004. Google Scholar
Digital Library
- David Patterson, Garth Gibson, and Randy Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In SIGMOD'88, pages 109--116, Chicago, IL, June 1988. Google Scholar
Digital Library
- Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. IRON File Systems. In SOSP'05, pages 206--220, Brighton, UK, October 2005. Google Scholar
Digital Library
- Bianca Schroeder and Garth Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In FAST'07, pages 1--16, San Jose, CA, February 2007. Google Scholar
Digital Library
- Muthian Sivathanu, Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Improving Storage System Availability with D-GRAID. In FAST'04, pages 15--30, San Francisco, CA, April 2004. Google Scholar
Digital Library
- Sun Microsystems. ZFS: The last word in file systems. www.sun.com/2004-0914/feature/, 2006.Google Scholar
- Rajesh Sundaram. The Private Lives of Disk Drives. http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html, February 2006.Google Scholar
- Michael M. Swift, Brian N. Bershad, and Henry M. Levy. Improving the Reliability of Commodity Operating Systems. In SOSP'03, Bolton Landing, NY, October 2003. Google Scholar
Digital Library
- Nisha Talagala and David Patterson. An Analysis of Error Behaviour in a Large Storage System. In The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, San Juan, Puerto Rico, April 1999.Google Scholar
- Transaction Processing Council. TPC Benchmark B Standard Specification, Revision 3.2. Technical Report, 1990.Google Scholar
- Stephen C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, May 1998.Google Scholar
- X. Yu, B. Gum, Y. Chen, R. Y. Wang, K. Li, A. Krishnamurthy, and T. E. Anderson. Trading Capacity for Performance in a Disk Array. In OSDI'00, San Diego, CA, October 2000. Google Scholar
Digital Library
- Erez Zadok and Jason Nieh. FiST: A Language for Stackable File Systems. In USENIX'00, pages 55--70, San Diego, CA, June 2000. Google Scholar
Digital Library
Index Terms
Improving file system reliability with I/O shepherding
Recommendations
IRON file systems
SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principlesCommodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors ...
Improving file system reliability with I/O shepherding
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principlesWe introduce a new reliability infrastructure for file systems called I/O shepherding. I/O shepherding allows a file system developer to craft nuanced reliability policies to detect and recover from a wide range of storage system failures. We ...
IRON file systems
SOSP '05Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors ...







Comments