Abstract
Practical storage systems often adopt erasure codes to tolerate device failures and sector failures, both of which are prevalent in the field. However, traditional erasure codes employ device-level redundancy to protect against sector failures, and hence incur significant space overhead. Recent sector-disk (SD) codes are available only for limited configurations. By making a relaxed but practical assumption, we construct a general family of erasure codes called STAIR codes, which efficiently and provably tolerate both device and sector failures without any restriction on the size of a storage array and the numbers of tolerable device failures and sector failures. We propose the upstairs encoding and downstairs encoding methods, which provide complementary performance advantages for different configurations. We conduct extensive experiments on STAIR codes in terms of space saving, encoding/decoding speed, and update cost. We demonstrate that STAIR codes not only improve space efficiency over traditional erasure codes, but also provide better computational efficiency than SD codes based on our special code construction. Finally, we present analytical models that characterize the reliability of STAIR codes, and show that the support of a wider range of configurations by STAIR codes is critical for tolerating sector failure bursts discovered in the field.
- Bairavasundaram, L. N., Goodson, G. R., Pasupathy, S., and Schindler, J. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07), 289--300. Google Scholar
Digital Library
- Blaum, M. 2006. A family of MDS array codes with minimal number of encoding operations. In Proceedings of the IEEE International Symposium on Information Theory (ISIT’06), 2784--2788.Google Scholar
Cross Ref
- Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44, 2, 192--202. Google Scholar
Digital Library
- Blaum, M., Bruck, J., and Vardy, A. 1996. MDS array codes with independent parity symbols. IEEE Trans. Inf. Theory 42, 2, 529--542. Google Scholar
Digital Library
- Blaum, M., Hafner, J. L., and Hetzler, S. 2013. Partial-MDS codes and their application to RAID type of architectures. IEEE Trans. Inf. Theory 59, 7, 4510--4519. Google Scholar
Digital Library
- Blaum, M., Hafner, J. L., and Hetzler, S. R. 2012. Nested multiple erasure correcting codes for storage arrays. U.S. Patent No. 13/036,845, Filed February 28, 2011, Issued August 30, 2012.Google Scholar
- Blaum, M. and Plank, J. S. 2013. Construction of sector-disk (SD) codes with two global parity symbols. IBM Res. Rep. RJ10511 (ALM1308-007), Almaden Research Center, IBM Research Division.Google Scholar
- Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., and Zuckerman, D. 1995. An XOR-based erasure-resilient coding scheme. Tech. Rep. TR-95-048, International Computer Science Institute, University of California, Berkeley.Google Scholar
- Boboila, S. and Desnoyers, P. 2010. Write endurance in flash drives: Measurements and analysis. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), 115--128. Google Scholar
Digital Library
- Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04), 1--14. Google Scholar
Digital Library
- Dholakia, A., Eleftheriou, E., Hu, X.-Y., Iliadis, I., Menon, J., and Rao, K. 2008. A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans. Storage 4, 1, 1--42. Google Scholar
Digital Library
- Dholakia, A., Eleftheriou, E., Hu, X.-Y., Iliadis, I., Menon, J., and Rao, K. 2011. Disk scrubbing versus intradisk redundancy for RAID storage systems. ACM Trans. Storage 7, 2, 1--42. Google Scholar
Digital Library
- Elias, P. 1954. Error-free coding. IRE Trans. Inf. Theory 4, 4, 29--37.Google Scholar
Cross Ref
- Feng, G., Deng, R., Bao, F., and Shen, J. 2005a. New efficient MDS array codes for RAID Part I: Reed-Solomon-like codes for tolerating three disk failures. IEEE Trans. Comput. 54, 9, 1071--1080. Google Scholar
Digital Library
- Feng, G., Deng, R., Bao, F., and Shen, J. 2005b. New efficient MDS array codes for RAID Part II: Rabin-like codes for tolerating multiple (≥4) disk failures. IEEE Trans. Comput. 54, 12, 1473--1483. Google Scholar
Digital Library
- Greenan, K. M., Plank, J. S., and Wylie, J. J. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd Workshop on Hot Topics in Storage and File Systems (HotStorage’10), 1--5. Google Scholar
Digital Library
- Grupp, L. M., Caulfield, A. M., Coburn, J., Swanson, S., Yaakobi, E., Siegel, P. H., and Wolf, J. K. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the 42nd International Symposium on Microarchitecture (MICRO’09), 24--33. Google Scholar
Digital Library
- Grupp, L. M., Davis, J. D., and Swanson, S. 2012. The bleak future of NAND flash memory. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), 17--24. Google Scholar
Digital Library
- Hafner, J. L. 2005. WEAVER codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05), 211--224. Google Scholar
Digital Library
- Hafner, J. L. 2006. HoVer erasure codes for disk arrays. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’06), 1--10. Google Scholar
Digital Library
- Huang, C., Chen, M., and Li, J. 2013. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. Storage 9, 1, 1--28. Google Scholar
Digital Library
- Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’12), 15--26. Google Scholar
Digital Library
- Huang, C. and Xu, L. 2005. STAR: An efficient coding scheme for correcting triple storage node failures. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05), 889--901. Google Scholar
Digital Library
- Iliadis, I. and Hu, X.-Y. 2008. Reliability assurance of RAID storage systems for a wide range of latent sector errors. In Proceedings of the IEEE International Conference on Networking, Architecture, and Storage (NAS’08), 10--19. Google Scholar
Digital Library
- Intel. 2005. Intelligent RAID 6 theory --- overview and implementation. White Paper. Intel Corporation.Google Scholar
- Li, M. and Lee, P. P. C. 2014. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14), 147--162. Google Scholar
Digital Library
- Li, M. and Shu, J. 2011. C-Codes: Cyclic lowest-density MDS array codes constructed using starters for RAID 6. IBM Res. Rep. RC25218 (C1110-004), China Research Laboratory, IBM Research Division.Google Scholar
- Li, M., Shu, J., and Zheng, W. 2009. GRID codes: Strip-based erasure codes with high fault tolerance for storage systems. ACM Trans. Storage 4, 4, 1--22. Google Scholar
Digital Library
- Oprea, A. and Juels, A. 2010. A clean-slate look at disk scrubbing. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), 1--14. Google Scholar
Digital Library
- Pinheiro, E., Weber, W.-D., and Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07), 17--28. Google Scholar
Digital Library
- Plank, J. S. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exp. 27, 9, 995--1012. Google Scholar
Digital Library
- Plank, J. S. and Blaum, M. 2014. Sector-disk (SD) erasure codes for mixed failure modes in RAID systems. ACM Trans. Storage 10, 1, 1--17. Google Scholar
Digital Library
- Plank, J. S., Blaum, M., and Hafner, J. L. 2013a. SD codes: Erasure codes designed for how storage systems really fail. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 95--104. Google Scholar
Digital Library
- Plank, J. S., Buchsbaum, A. L., and Vander Zanden, B. T. 2011. Minimum density RAID-6 codes. ACM Trans. Storage 6, 4, 1--22. Google Scholar
Digital Library
- Plank, J. S. and Ding, Y. 2005. Note: Correction to the 1997 tutorial on Reed-Solomon coding. Softw. Pract. Exp. 35, 2, 189--194. Google Scholar
Digital Library
- Plank, J. S., Greenan, K. M., and Miller, E. L. 2013b. Screaming fast Galois Field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 299--306. Google Scholar
Digital Library
- Plank, J. S. and Huang, C. 2013. Tutorial: Erasure coding for storage applications. Slides presented at the 11th USENIX Conference on File and Storage Technologies.Google Scholar
- Plank, J. S. and Xu, L. 2006. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06), 173--180. Google Scholar
Digital Library
- Reed, I. S. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Soc. Indust. Appl. Math. 8, 2, 300--304.Google Scholar
Cross Ref
- Sathiamoorthy, M., Asteris, M., Papailiopoulous, D., Dimakis, A. G., Vadali, R., Chen, S., and Borthakur, D. 2013. XORing elephants: Novel erasure codes for big data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB’13), 325--336. Google Scholar
Digital Library
- Schroeder, B., Damouras, S., and Gill, P. 2010. Understanding latent sector errors and how to protect against them. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), 71--84. Google Scholar
Digital Library
- Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07), 1--16. Google Scholar
Digital Library
- Schwarz, T. J. E., Xin, Q., Miller, E. L., and Long, D. D. E. 2004. Disk scrubbing in large archival storage systems. In Proceedings of the 12th Annual Meeting of the IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’04), 409--418. Google Scholar
Digital Library
- White, J. and Lueth, C. 2010. RAID-DP: NetApp implementation of double-parity RAID for data protection. Tech. Rep. TR-3298, NetApp, Inc.Google Scholar
- Wildani, A., Schwarz, T. J. E., Miller, E. L., and Long, D. D. 2009. Protecting against rare event failures in archival systems. In Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’09), 1--11.Google Scholar
- Xu, L., Bohossian, V., Bruck, J., and Wagner, D. G. 1999. Low-density MDS codes and factors of complete graphs. IEEE Trans. Inf. Theory 45, 6, 1817--1826. Google Scholar
Digital Library
- Xu, L. and Bruck, J. 1999. X-Code: MDS array codes with optimal encoding. IEEE Trans. Inf. Theory 45, 1, 272--276. Google Scholar
Digital Library
- Zheng, M., Tucek, J., Qin, F., and Lillibridge, M. 2013. Understanding the robustness of SSDs under power fault. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 271--284. Google Scholar
Digital Library
Index Terms
STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures
Recommendations
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems
Traditionally, when storage systems employ erasure codes, they are designed to tolerate the failures of entire disks. However, the most common types of failures are latent sector failures, which only affect individual disk sectors, and block failures ...
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems
We design flexible schemes to explore the tradeoffs between storage space and access efficiency in reliable data storage systems. Aiming at this goal, two new classes of erasure-resilient codes are introduced -- Basic Pyramid Codes (BPC) and Generalized ...
STAIR codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems
FAST'14: Proceedings of the 12th USENIX conference on File and Storage TechnologiesPractical storage systems often adopt erasure codes to tolerate device failures and sector failures, both of which are prevalent in the field. However, traditional erasure codes employ device-level redundancy to protect against sector failures, and ...






Comments