skip to main content
research-article

Online availability upgrades for parity-based RAIDs through supplementary parity augmentations

Authors Info & Claims
Published:02 June 2011Publication History
Skip Abstract Section

Abstract

In this article, we propose a simple but powerful online availability upgrade mechanism, Supplementary Parity Augmentations(SPA), to address the availability issue in parity-based RAID systems. The basic idea of SPA is to store and update the supplementary parity units on one or a few newly augmented spare disks for online RAID systems in the operational mode, thus achieving the goals of improving the reconstruction performance while tolerating multiple disk failures and latent sector errors simultaneously. By applying the exclusive OR operations appropriately among supplementary parity, full parity, and data units, SPA can reconstruct the data on the failed disks with a fraction of the original overhead that is proportional to the supplementary parity coverage, thus significantly reducing the overhead of data regeneration and decreasing recovery time in parity-based RAID systems. Our extensive trace-driven simulation study shows that SPA can significantly improve the reconstruction performance of the RAID5 and RAID5+0 systems, at an acceptable performance overhead imposed in the operational mode. Moreover, our reliability analytical modeling and sequential Monte-Carlo simulation demonstrate that SPA is consistently more than double the MTTDL of the RAID5 system and improves the reliability of the RAID5+0 system noticeably.

References

  1. Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. 2008. Design tradeoffs for ssd performance. In Proceedings of the Usenix Annual Technical Conference (USENIX'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson, D. and Whittington, W. 2007. Hard drives: Today and tomorrow. http://www.usenix.org/events/fast07/tutorialGoogle ScholarGoogle Scholar
  3. Bairavasundaram, L. N., Goodson, G. R., Pasupathy, S., and Schindler, J. 2007. An analysis of latent sector errors in disk drives. In ACM SIGMETRICS'07 Joint International Conference on Measurement and Modeling of Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Balakrishnan, M., Kadav, A., Prabhakaran, V., and Malkhi, D. 2010. Differential raid: Rethinking raid for ssd reliability. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Birrell, A., Isard, M., Thacker, C., and Wobber, T. 2007. A design for high-performance flash disks. SIGOPS Oper. Syst. Rev. 41, 2, 88--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. Evenodd: An efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44, 2, 192--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bucy, J. S., Schindler, J., Schlosser, S. W., Ganger, G. R., et al. 2008. The DiskSim simulation environment version 4.0 reference manual. Tech. rep. CMU-PDL-08-101, Parallel Data Lab, Carnegie Mellon University.Google ScholarGoogle Scholar
  8. Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-Diagonal parity for double disk failure correction. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dholakia, A., Eleftheriou, E., Hu, X.-Y., Iliadis, I., Menon, J., and Rao, K. 2008. A new intra-disk redundancy scheme for high-reliability raid storage systems in the presence of unrecoverable errors. ACM Trans. Storage 4, 1, 1--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Elerath, J. G. and Pecht, M. 2007. Enhanced reliability modeling of raid storage systems. In Proceedings of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gibson, G. A. 2007. Reflections on failure in post-terascale parallel computing. http://www.cse. ohio- state.edu/~lai/icpp- 2007/Gibson- PDSI- ICPP07- keynote.pdfGoogle ScholarGoogle Scholar
  12. Golding, R., Bosch, P., Staelin, C., Sullivan, T., and Wi Lkes, J. 1995. Idleness is not sloth. In Proceedings of the USENIX'95 Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Heath, T., Diniz, B., Carrera, E. V., Jr., W. M., and Bianchini, R. 2005. Energy conservation in heterogeneous server clusters. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (PPoPP'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Holland, M., Gibson, G., and Siewiorek, D. 1993. Fast, on-line failure recovery in redundant disk arrays. In Proceedings of the 23<sup>rd</sup> International Symposium on Fault-Tolerant Computing (FTCS'93) Digest of Papers.Google ScholarGoogle Scholar
  15. Iliadis, I., Haas, R., Hu, X.-Y., and Elef T Heriou, E. 2008. Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems. In Proceedings of the ACM SIGMETRICS'08 Joint International Conference on Measurement and Modeling of Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jiang, W., Hu, C., Zhou, Y., and Kanevsky, A. 2008. Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lee, E. K. and Katz, R. H. 1993. The performance of parity placements in disk arrays. IEEE Trans. Comput. 42, 6, 651--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lee, J. Y. B. and Lui, J. C. S. 2002. Automatic recovery from disk failure in continuous-media servers. IEEE Trans. Parall. Distrib. Syst. 13, 499--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mi, N., Riska, A., Li, X., Smirni, E., and Riedel, E. 2009. Restrained utilization of idleness for transparent scheduling of background tasks. In Proceedings of the 11$^th$International Joint Conference on Measurement and Modeling of Computer (SIGMETRICS'09). 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mi, N., Riska, A., Smirni, E., and Riedel, E. 2008. Enhancing data availability in disk drives through background activities. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN'08).Google ScholarGoogle Scholar
  21. Narayanan, D., Donnelly, A., and Rowstron, A. 2008. Write off-loading: Practical power management for enterprise storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST '08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nathuji, R., Isci, C., and Gorbatov, E. 2007. Exploiting platform heterogeneity for power efficient data centers. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Net Bsd Foundation. 2008. The netbsd guide. Chapter 16, netbsd raidframe. http://www.netbsd.org/docs/guide/en/chap- rf.htmlGoogle ScholarGoogle Scholar
  24. Østergaard, J. and Bueso, E. 2004. The software-raid howto. http://tldp.org/HOWTO/Software- RAID- HOWTO.htmlGoogle ScholarGoogle Scholar
  25. Patterson, D. A., Gibson, G., and Katz, R. H. 1988. A case for redundant arrays of inexpensive disks (raid). In Proceedings of the ACM SIGMOD'88 International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pinheiro, E., Weber, W.-D., and Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Plank, J. S. 1997. A tutorial on reed-solomon coding for fault-tolerance in raid-like systems. Softw.Pract. Exper. 27, 995--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Plank, J. S. 2008. The raid-6 liberation codes. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Riska, A. and Riedel, E. 2008. Idle read after write: Iraw. In Proceedings of the USENIX'08 Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX'00 Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Samsung Corporation. 2007. K9xxg08xxm flash memory specification. http://www.samsung.com/global/system/business/semiconductor/product/2007/6/11/NANDFlash/SLC_LargeBlock/8Gbit/K9F8G08U0M/ds_k9f8g08x0m_rev10.pdfGoogle ScholarGoogle Scholar
  32. Savage, S. and Wilkes, J. 1996. Afraid: A frequently redundant array of independent disks. In Proceedings of the USENIX'96. Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an mttf of 1,000,000 hours mean to you? In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Schwarz, T. J. E., Xin, Q., Hospodor, A., and Ng, S. 2004. Disk scrubbing in large archival storage systems. In Proceedings of the Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation on Computer and Telecommunication Systems (MASCOTS'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Seagate. 2007. Cheetah 15k.5 hard drives. http://www.seagate.com/www/en- us/products/servers/cheetah/cheetah_15k.5Google ScholarGoogle Scholar
  36. Sivathanu, M., Prabhakaran, V., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2004. Improving storage system availability with d-graid. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Storage Performance Council. 2009. http://www.storageperformance.orgGoogle ScholarGoogle Scholar
  38. Techtarget. 2008. Enterprise-Class raid functions: What they are and when to use them. http://searchstorage .techtarget.com/tip/1, 289483, sid5_gci996811, 00.htmlGoogle ScholarGoogle Scholar
  39. Tian, L., Feng, D., Jiang, H., Zhou, K., Zeng, L., Chen, J., Wang, Z., and Song, Z. 2007. Pro: A popularity-based multi-threaded reconstruction optimization for raid-structured storage systems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tian, L., Jiang, H., Feng, D., Cao, Q., Xie, C., and Xin, Q. 2009. Spa: On-Line availability upgrades for parity-based raids through supplementary parity augmentations. Tech. rep. TR-UNL-CSE-2009-0006, Department of Computer Science and Engineering, University of Nebraska-Lincoln.Google ScholarGoogle Scholar
  41. Umass Trace Repository. 2007. Oltp application and search engine storage traces. http://traces.cs.umass.edu/index.php/Storage/StorageGoogle ScholarGoogle Scholar
  42. Weddle, C., Oldham, M., Qian, J., and Iandywang, A. 2007. Paraid: The gear-shifting power-aware raid. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Whittington, W. 2007. Desktop, nearline and enterprise disk drives, - delta by design -. http://www.snia.org/education/tutorials/2007/spring/storage/Desktop_Nearline_Deltas_by_Design.pdfGoogle ScholarGoogle Scholar
  44. Wu, S., Jiang, H., Feng, D., Tian, L., and Mao, B. 2009. WorkOut: I/O workload outsourcing for boosting raid reconstruction performance. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST'09). 239--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xie, T. and Sharma, A. 2009. Collaboration-Oriented data recovery for mobile disk arrays. In Proceedings of the 29th IEEE International Conference on Distributed Computing Systems (ICDCS'09). 631--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xie, T. and Wang, H. 2008. MICRO: A multi-level caching-based reconstruction optimization for mobile storage systems. IEEE Trans. Comput. 57, 10, 1386--1398. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online availability upgrades for parity-based RAIDs through supplementary parity augmentations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 6, Issue 4
          May 2011
          72 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/1970338
          Issue’s Table of Contents

          Copyright © 2011 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 June 2011
          • Accepted: 1 November 2010
          • Revised: 1 August 2010
          • Received: 1 April 2010
          Published in tos Volume 6, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!