Abstract
In this article, we propose a simple but powerful online availability upgrade mechanism, Supplementary Parity Augmentations(SPA), to address the availability issue in parity-based RAID systems. The basic idea of SPA is to store and update the supplementary parity units on one or a few newly augmented spare disks for online RAID systems in the operational mode, thus achieving the goals of improving the reconstruction performance while tolerating multiple disk failures and latent sector errors simultaneously. By applying the exclusive OR operations appropriately among supplementary parity, full parity, and data units, SPA can reconstruct the data on the failed disks with a fraction of the original overhead that is proportional to the supplementary parity coverage, thus significantly reducing the overhead of data regeneration and decreasing recovery time in parity-based RAID systems. Our extensive trace-driven simulation study shows that SPA can significantly improve the reconstruction performance of the RAID5 and RAID5+0 systems, at an acceptable performance overhead imposed in the operational mode. Moreover, our reliability analytical modeling and sequential Monte-Carlo simulation demonstrate that SPA is consistently more than double the MTTDL of the RAID5 system and improves the reliability of the RAID5+0 system noticeably.
- Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. 2008. Design tradeoffs for ssd performance. In Proceedings of the Usenix Annual Technical Conference (USENIX'08). Google Scholar
Digital Library
- Anderson, D. and Whittington, W. 2007. Hard drives: Today and tomorrow. http://www.usenix.org/events/fast07/tutorialGoogle Scholar
- Bairavasundaram, L. N., Goodson, G. R., Pasupathy, S., and Schindler, J. 2007. An analysis of latent sector errors in disk drives. In ACM SIGMETRICS'07 Joint International Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Balakrishnan, M., Kadav, A., Prabhakaran, V., and Malkhi, D. 2010. Differential raid: Rethinking raid for ssd reliability. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). 15--26. Google Scholar
Digital Library
- Birrell, A., Isard, M., Thacker, C., and Wobber, T. 2007. A design for high-performance flash disks. SIGOPS Oper. Syst. Rev. 41, 2, 88--93. Google Scholar
Digital Library
- Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. Evenodd: An efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44, 2, 192--202. Google Scholar
Digital Library
- Bucy, J. S., Schindler, J., Schlosser, S. W., Ganger, G. R., et al. 2008. The DiskSim simulation environment version 4.0 reference manual. Tech. rep. CMU-PDL-08-101, Parallel Data Lab, Carnegie Mellon University.Google Scholar
- Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-Diagonal parity for double disk failure correction. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'04). Google Scholar
Digital Library
- Dholakia, A., Eleftheriou, E., Hu, X.-Y., Iliadis, I., Menon, J., and Rao, K. 2008. A new intra-disk redundancy scheme for high-reliability raid storage systems in the presence of unrecoverable errors. ACM Trans. Storage 4, 1, 1--42. Google Scholar
Digital Library
- Elerath, J. G. and Pecht, M. 2007. Enhanced reliability modeling of raid storage systems. In Proceedings of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). Google Scholar
Digital Library
- Gibson, G. A. 2007. Reflections on failure in post-terascale parallel computing. http://www.cse. ohio- state.edu/~lai/icpp- 2007/Gibson- PDSI- ICPP07- keynote.pdfGoogle Scholar
- Golding, R., Bosch, P., Staelin, C., Sullivan, T., and Wi Lkes, J. 1995. Idleness is not sloth. In Proceedings of the USENIX'95 Conference. Google Scholar
Digital Library
- Heath, T., Diniz, B., Carrera, E. V., Jr., W. M., and Bianchini, R. 2005. Energy conservation in heterogeneous server clusters. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (PPoPP'05). Google Scholar
Digital Library
- Holland, M., Gibson, G., and Siewiorek, D. 1993. Fast, on-line failure recovery in redundant disk arrays. In Proceedings of the 23<sup>rd</sup> International Symposium on Fault-Tolerant Computing (FTCS'93) Digest of Papers.Google Scholar
- Iliadis, I., Haas, R., Hu, X.-Y., and Elef T Heriou, E. 2008. Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems. In Proceedings of the ACM SIGMETRICS'08 Joint International Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Jiang, W., Hu, C., Zhou, Y., and Kanevsky, A. 2008. Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'08). Google Scholar
Digital Library
- Lee, E. K. and Katz, R. H. 1993. The performance of parity placements in disk arrays. IEEE Trans. Comput. 42, 6, 651--664. Google Scholar
Digital Library
- Lee, J. Y. B. and Lui, J. C. S. 2002. Automatic recovery from disk failure in continuous-media servers. IEEE Trans. Parall. Distrib. Syst. 13, 499--515. Google Scholar
Digital Library
- Mi, N., Riska, A., Li, X., Smirni, E., and Riedel, E. 2009. Restrained utilization of idleness for transparent scheduling of background tasks. In Proceedings of the 11$^th$International Joint Conference on Measurement and Modeling of Computer (SIGMETRICS'09). 205--216. Google Scholar
Digital Library
- Mi, N., Riska, A., Smirni, E., and Riedel, E. 2008. Enhancing data availability in disk drives through background activities. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN'08).Google Scholar
- Narayanan, D., Donnelly, A., and Rowstron, A. 2008. Write off-loading: Practical power management for enterprise storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST '08). Google Scholar
Digital Library
- Nathuji, R., Isci, C., and Gorbatov, E. 2007. Exploiting platform heterogeneity for power efficient data centers. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC'07). Google Scholar
Digital Library
- Net Bsd Foundation. 2008. The netbsd guide. Chapter 16, netbsd raidframe. http://www.netbsd.org/docs/guide/en/chap- rf.htmlGoogle Scholar
- Østergaard, J. and Bueso, E. 2004. The software-raid howto. http://tldp.org/HOWTO/Software- RAID- HOWTO.htmlGoogle Scholar
- Patterson, D. A., Gibson, G., and Katz, R. H. 1988. A case for redundant arrays of inexpensive disks (raid). In Proceedings of the ACM SIGMOD'88 International Conference on Management of Data. Google Scholar
Digital Library
- Pinheiro, E., Weber, W.-D., and Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google Scholar
Digital Library
- Plank, J. S. 1997. A tutorial on reed-solomon coding for fault-tolerance in raid-like systems. Softw.Pract. Exper. 27, 995--1012. Google Scholar
Digital Library
- Plank, J. S. 2008. The raid-6 liberation codes. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'08). Google Scholar
Digital Library
- Riska, A. and Riedel, E. 2008. Idle read after write: Iraw. In Proceedings of the USENIX'08 Conference. Google Scholar
Digital Library
- Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX'00 Conference. Google Scholar
Digital Library
- Samsung Corporation. 2007. K9xxg08xxm flash memory specification. http://www.samsung.com/global/system/business/semiconductor/product/2007/6/11/NANDFlash/SLC_LargeBlock/8Gbit/K9F8G08U0M/ds_k9f8g08x0m_rev10.pdfGoogle Scholar
- Savage, S. and Wilkes, J. 1996. Afraid: A frequently redundant array of independent disks. In Proceedings of the USENIX'96. Conference. Google Scholar
Digital Library
- Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an mttf of 1,000,000 hours mean to you? In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google Scholar
Digital Library
- Schwarz, T. J. E., Xin, Q., Hospodor, A., and Ng, S. 2004. Disk scrubbing in large archival storage systems. In Proceedings of the Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation on Computer and Telecommunication Systems (MASCOTS'04). Google Scholar
Digital Library
- Seagate. 2007. Cheetah 15k.5 hard drives. http://www.seagate.com/www/en- us/products/servers/cheetah/cheetah_15k.5Google Scholar
- Sivathanu, M., Prabhakaran, V., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2004. Improving storage system availability with d-graid. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'04). Google Scholar
Digital Library
- Storage Performance Council. 2009. http://www.storageperformance.orgGoogle Scholar
- Techtarget. 2008. Enterprise-Class raid functions: What they are and when to use them. http://searchstorage .techtarget.com/tip/1, 289483, sid5_gci996811, 00.htmlGoogle Scholar
- Tian, L., Feng, D., Jiang, H., Zhou, K., Zeng, L., Chen, J., Wang, Z., and Song, Z. 2007. Pro: A popularity-based multi-threaded reconstruction optimization for raid-structured storage systems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google Scholar
Digital Library
- Tian, L., Jiang, H., Feng, D., Cao, Q., Xie, C., and Xin, Q. 2009. Spa: On-Line availability upgrades for parity-based raids through supplementary parity augmentations. Tech. rep. TR-UNL-CSE-2009-0006, Department of Computer Science and Engineering, University of Nebraska-Lincoln.Google Scholar
- Umass Trace Repository. 2007. Oltp application and search engine storage traces. http://traces.cs.umass.edu/index.php/Storage/StorageGoogle Scholar
- Weddle, C., Oldham, M., Qian, J., and Iandywang, A. 2007. Paraid: The gear-shifting power-aware raid. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google Scholar
Digital Library
- Whittington, W. 2007. Desktop, nearline and enterprise disk drives, - delta by design -. http://www.snia.org/education/tutorials/2007/spring/storage/Desktop_Nearline_Deltas_by_Design.pdfGoogle Scholar
- Wu, S., Jiang, H., Feng, D., Tian, L., and Mao, B. 2009. WorkOut: I/O workload outsourcing for boosting raid reconstruction performance. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST'09). 239--252. Google Scholar
Digital Library
- Xie, T. and Sharma, A. 2009. Collaboration-Oriented data recovery for mobile disk arrays. In Proceedings of the 29th IEEE International Conference on Distributed Computing Systems (ICDCS'09). 631--638. Google Scholar
Digital Library
- Xie, T. and Wang, H. 2008. MICRO: A multi-level caching-based reconstruction optimization for mobile storage systems. IEEE Trans. Comput. 57, 10, 1386--1398. Google Scholar
Digital Library
Index Terms
Online availability upgrades for parity-based RAIDs through supplementary parity augmentations
Recommendations
Parity logging disk arrays
Parity-encoded redundant disk arrays provide highly reliable, cost-effective secondary storage with high performance for reads and large writes. Their performance on small writes, however, is much worse than mirrored disks—the traditional, highly ...
Parity management scheme for a hybrid-storage RAID
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied ComputingRecently, the notion of RAID technique has been adopted in SSD-based storage systems to improve the system reliability. However, directly applying RAID technique over SSD's would incur performance degradation due to the out-place-update nature of flash ...
Rebuild processing in RAID5 with emphasis on the supplementary parity augmentation method[37]
The rotated parity RAID5 disk array tolerates single disk failures by continuing operation by on-demand reconstruction of data blocks of the failed disk, until the systematic reconstruction of the contents of the failed disk is completed by the rebuild ...






Comments