skip to main content
article

Improving storage system availability with D-GRAID

Published:01 May 2005Publication History
Skip Abstract Section

Abstract

We present the design, implementation, and evaluation of D-GRAID, a gracefully degrading and quickly recovering RAID storage array. D-GRAID ensures that most files within the file system remain available even when an unexpectedly high number of faults occur. D-GRAID achieves high availability through aggressive replication of semantically critical data, and fault-isolated placement of logically related data. D-GRAID also recovers from failures quickly, restoring only live file system data to a hot spare. Both graceful degradation and live-block recovery are implemented in a prototype SCSI-based storage system underneath unmodified file systems, demonstrating that powerful “file-system like” functionality can be implemented within a “semantically smart” disk system behind a narrow block-based interface.

References

  1. Acharya, A., Uysal, M., and Saltz, J. 1998. Active disks: Programming model, algorithms and evaluation. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII, San Jose, CA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alvarez, G. A., Burkhard, W. A., and Cristian, F. 1997. Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA '97, Denver, CO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anderson, D., Chase, J., and Vahdat, A. 2002. Interposed request routing for scalable network storage. ACM Trans. Comput. Syst. 20, 1 (Feb.), 25--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bitton, D. and Gray, J. 1988. Disk shadowing. In Proceedings of the 14th International Conference on Very Large Data Bases (VLDB 14, Los Angeles, CA). 331--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Boehm, H. and Weiser, M. 1988. Garbage collection in an uncooperative environment. Softw.---Pract. Exper. 18, 9 (Sep.), 807--820. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burkhard, W. and Menon, J. 1993. Disk array storage system reliability. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23, Toulouse, France). 432--441.Google ScholarGoogle Scholar
  7. Chapin, J., Rosenblum, M., Devine, S., Lahiri, T., Teodosiu, D., and Gupta, A. 1995. Hive: Fault containment for shared-memory multiprocessors. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP '95, Copper Mountain Resort, CO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2 (June), 145--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Denehy, T. E., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2002. Bridging the information gap in storage protocol stacks. In Proceedings of the USENIX Annual Technical Conference (USENIX '02, Monterey, CA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dowse, I. and Malone, D. 2002. Recent filesystem optimisations on FreeBSD. In Proceedings of the USENIX Annual Technical Conference (FREENIX Track, Monterey, CA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. EMC Corporation. 2002. Symmetrix Enterprise Information Storage Systems. EMC Corporation, Hopkinton, MA. Web site: http://www.emc.com.Google ScholarGoogle Scholar
  12. English, R. M. and Stepanov, A. A. 1992. Loge: A self-organizing disk controller. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter '92, San Francisco, CA).Google ScholarGoogle Scholar
  13. Ganger, G. R. 2001. Blurring the line between oses and storage devices. Tech. rep. CMU-CS-01-166. Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  14. Ganger, G. R., McKusick, M. K., Soules, C. A., and Patt, Y. N. 2000. Soft updates: A solution to the metadata update problem in file systems. ACM Trans. Comput. Syst. 18, 2 (May), 127--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ganger, G. R., Worthington, B. L., Hou, R. Y., and Patt, Y. N. 1993. Disk subsystem load balancing: Disk striping vs. conventional data placement. In HICSS '93.Google ScholarGoogle Scholar
  16. Gibson, G. A., Nagle, D. F., Amiri, K., Butler, J., Chang, F. W., Gobioff, H., Hardin, C., Riedel, E., Rochberg, D., and Zelenka, J. 1998. A cost-effective, high-bandwidth storage architecture. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII, San Jose, CA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gray, J. 1987. Why do computers stop and what can we do about it? In Proceedings of the 6th International Conference on Reliability and Distributed Databases.Google ScholarGoogle Scholar
  18. Gray, J., Horst, B., and Walker, M. 1990. Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB 16, Brisbane, Australia). 148--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gribble, S. D. 2001. Robustness in complex systems. In Proceedings of the Eighth Workshop on Hot Topics in Operating Systems (HotOS VIII, Schloss Elmau, Germany). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hagmann, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP '87, Austin, Texas). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Holland, M., Gibson, G., and Siewiorek, D. 1993. Fast, on-line failure recovery in redundant disk arrays. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23, Toulouse, France).Google ScholarGoogle Scholar
  22. Hsiao, H.-I. and DeWitt, D. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the 6th International Data Engineering Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. IBM. 2001. ServeRAID---recovering from multiple disk failures. Web site: http://www.pc.ibm.com/qtechinfo/MIGR-39144.html.Google ScholarGoogle Scholar
  24. Ji, M., Felten, E., Wang, R., and Singh, J. P. 2000. Archipelago: An island-based file system for highly available and scalable Internet services. In Proceedings of the 4th USENIX Windows Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Katcher, J. 1997. PostMark: A new file system benchmark. Tech. rep. TR-3022, Network Appliance Inc., Sunnyvale, CA. Web site: http://www.netapp.com.Google ScholarGoogle Scholar
  26. Keeton, K. and Wilkes, J. 2002. Automating data dependability. In Proceedings of the 10th ACM-SIGOPS European Workshop. (Saint-Emilion, France). 93--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kistler, J. and Satyanarayanan, M. 1992. Disconnected operation in the Coda file system. ACM Trans. Comput. Syst. 10, 1 (Feb.), 3--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3 (Aug.), 181--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Menon, J. and Mattson, D. 1992. Comparison of sparing alternatives for disk arrays. In ISCA '92. (Gold Coast, Australia). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Microsoft Corporation. 2000. Web site: http://www.microsoft.com/hwdev/.Google ScholarGoogle Scholar
  31. Orji, C. U. and Solworth, J. A. 1993. Doubly distorted mirrors. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD '93, Washington, DC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Park, A. and Balasubramanian, K. 1986. Providing fault tolerance in parallel secondary storage systems. Tech. rep. CS-TR-057-86. Princeton, University, Princeton, NJ.Google ScholarGoogle Scholar
  33. Patterson, D., Gibson, G., and Katz, R. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD Conference on the Management of Data (SIGMOD '88, Chicago, IL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Patterson, D. A. 2002. Availability and maintainability ≫ performance: New focus for a new century. Key note speech at FAST '02.Google ScholarGoogle Scholar
  35. Popek, G., Walker, B., Chow, J., Edwards, D., Kline, C., Rudisin, G., and Thiel, G. 1981. LOCUS: A network transparent, high reliability distributed system. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP '81, Pacific Grove, CA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Reddy, A. L. N. and Banerjee, P. 1991. Gracefully degradable disk arrays. In Proceedings of the 21st International Symposium on Fault-Tolerant Computing (FTCS-21, Montreal, P.Q. Canada). 401--408.Google ScholarGoogle Scholar
  37. Riedel, E., Gibson, G., and Faloutsos, C. 1998. Active storage for large-scale data mining and multimedia. In Proceedings of the 24th International Conference on Very Large Databases (VLDB 24, New York, NY). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Riedel, E., Kallahalla, M., and Swaminathan, R. 2002. A framework for evaluating storage system security. In Proceedings of the 1st USENIX Symposium on File and Storage Technologies (FAST '02, Monterey, CA). 14--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rosenblum, M. and Ousterhout, J. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb.), 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rowstron, A. and Druschel, P. 2001. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01, Banff, Alto., Canada). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ruemmler, C. and Wilkes, J. 1991. Disk shuffling. Tech. rep. HPL-91-156. Hewlett Packard Laboratories, Palo Alto, CA.Google ScholarGoogle Scholar
  42. Saito, Y., Karamanolis, C., Karlsson, M., and Mahalingam, M. 2002. Taming aggressive replication in the Pangaea wide-area file system. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02, Boston, MA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Savage, S. and Wilkes, J. 1996. AFRAID---a frequently redundant array of independent disks. In Proceedings of the USENIX Annual Technical Conference (USENIX '96, San Diego, CA). 27--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sivathanu, M., Prabhakaran, V., Popovici, F. I., Denehy, T. E., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2003. Semantically-smart disk systems. In FAST '03 (San Francisco, CA). 73--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ts'o, T. and Tweedie, S. 2002. Future directions for the Ext2/3 filesystem. In Proceedings of the USENIX Annual Technical Conference (FREENIX Track, Monterey, CA).Google ScholarGoogle Scholar
  46. Wang, R., Anderson, T. E., and Patterson, D. A. 1999. Virtual log-based file systems for a programmable disk. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI '99, New Orleans, LA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wilkes, J., Golding, R., Staelin, C., and Sullivan, T. 1996. The HP AutoRAID hierarchical storage system. ACM Trans. Comput. Syst. 14, 1 (Feb.), 108--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wolf, J. L. 1989. The placement optimization problem: A practical solution to the disk file assignment problem. In Proceedings of the 1989 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '89, Berkeley, CA). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving storage system availability with D-GRAID

            Recommendations

            Reviews

            Eno Thereska

            The authors of this paper have significant experience with designing "semantically smart" storage systems. These systems make educated guesses as to how higher-level applications (like file systems and databases) work, and help them in being more efficient. This paper addresses the issue of availability of file system data. Not all file system data has equal importance. Losing small amounts of metadata, for example, could disable access to whole directories and their files. D-GRAID automatically infers which file system blocks correspond to metadata and uses a higher degree of replication for them. D-GRAID automatically infers which files are most commonly used and replicates them for better performance and higher availability. During failure, these files are also recovered first. The main strength of D-GRAID is its nonintrusive way of detecting the file system type and the important data blocks that need special treatment. It does this without requiring file system changes, and the authors show their method works with diverse file systems (Linux and Windows based). The main drawback of this approach is its limited flexibility. It may be important to differentiate data also by other metrics, such as cost. Financial documents, for example, are in general believed to be more important than music files, and although music files may be accessed more often, financial data should be more available and recovered first. As the authors mention, D-GRAID can indeed be used with existing redundancy mechanisms such as redundant array of independent disks (RAID). The authors hint that more important and less important data could be stored using different RAID schemes in the first place. However, that begs the question of why the metadata and other important file system-related data blocks could not also use different RAID levels than less important data (RAID is possible in software too). This way, instead of inferring partial information about applications, the applications themselves decide how and where to store the data. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!