Abstract
We design flexible schemes to explore the tradeoffs between storage space and access efficiency in reliable data storage systems. Aiming at this goal, two new classes of erasure-resilient codes are introduced -- Basic Pyramid Codes (BPC) and Generalized Pyramid Codes (GPC). Both schemes require slightly more storage space than conventional schemes, but significantly improve the critical performance of read during failures and unavailability.
As a by-product, we establish a necessary matching condition to characterize the limit of failure recovery, that is, unless the matching condition is satisfied, a failure case is impossible to recover. In addition, we define a maximally recoverable (MR) property. For all ERC schemes holding the MR property, the matching condition becomes sufficient, that is, all failure cases satisfying the matching condition are indeed recoverable. We show that GPC is the first class of non-MDS schemes holding the MR property.
- Abd-El-Malek, M., W. V. C. Ii, Cranor, C., Ganger, G. R., Hendricks, J., Klosterman, A. J., Mesnier, M., Prasad, M., Salmon, B., Sambasivan, R. R., Sinnamohideen, S., Strunk, J. D., Thereska, E., Wachs, M., and Wylie, J. J. 2005. Ursa minor: Versatile cluster-based storage. In Proceedings of USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Aguilera, M. K., Janakiraman, R., and Xu, L. 2005. Using erasure codes for storage in a distributed system. In Proceedings of IEEE International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, CA. Google Scholar
Digital Library
- Blahut, R. E. 2003. Algebraic Codes for Data Transmission. Cambridge University Press.Google Scholar
- Blaum, M. and Roth, R. M. 1999. On lowest-density MDS codes. IEEE Trans. Inf. Theory. Google Scholar
Digital Library
- Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans.Computers. Google Scholar
Digital Library
- Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., and Zuckerman, D. 1995. An XOR-based erasure-resilient coding scheme. Tech. rep. TR-95-048, ICSI, Berkeley, CA.Google Scholar
- Borthakur, D., Schmidt, R., Vadali, R., Chen, S., and Kling, P. 2010. HDFS RAID. Hadoop User Group Meeting.Google Scholar
- Cadambe, V. R., Huang, C., and Li, J. 2011. Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems. In Proceedings of IEEE International Symposium on Information Theory. IEEE, Los Alamitos, CA.Google Scholar
- Cadambe, V. R., Huang, C., Li, J., and Mehrotra, S. 2011. Polynomial length MDS codes with optimal repair in distributed storage. In Proceedings of the Asilomar Conference on Signals, Systems and Computers.Google Scholar
- Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., McKelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., Haridas, J., Uddaraju, C., Khatri, H., Edwards, A., Bedekar, V., Mainali, S., Abbasi, R., Agarwal, A., Ul Haq, M. F., Ul Haq, M. I., Bhardwaj, D., Dayanand, S., Adusumilli, A., McNett, M., Sankaran, S., Manivannan, K., and Rigas, L. 2011. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York. Google Scholar
Digital Library
- Chang, F., Ji, M., Leung, S.-T., Maccormick, J., Perl, S., and Zhang, L. 2002. Myriad: Cost-effective disaster tolerance. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Chen, M., Huang, C., and Li, J. 2007. On the maximally recoverable property for multi-protection group codes. In Proceedings of the IEEE International Symposium on Information Theory.Google Scholar
- Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID -- High-performance, reliable secondary storage. ACM Comput. Surv. Google Scholar
Digital Library
- Chen, Y., Edler, J., Goldberg, A., Gottlieb, A., Sobti, S., and Yianilos, P. 1999. A prototype implementation of archival intermemory. In Proceedings of the ACM Conference on Digital Libraries. ACM, New York. Google Scholar
Digital Library
- Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Dimakis, A. G., Godfrey, P. B., Wu, Y., Wainwright, M., and Ramchandran, K. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theory. Google Scholar
Digital Library
- Dingledine, R., Freedman, M. J., and Molnar, D. 2000. The free haven project: Distributed anonymous storage service. In Proceedings of the Workshop on Design Issues in Anonymity and Unobservability. Google Scholar
Digital Library
- Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., and Welnicki, M. 2009. Hydrastor: A scalable secondary storage. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Fikes, A. 2010. Storage architecture and challenges. Google Faculty Summit.Google Scholar
- Ford, D., Labelle, F., Popovici, F. I., Stokely, M., Truong, V.-A., Barroso, L., Grimes, C., and Quinlan, S. 2010. Availability in globally distributed storage systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. Google Scholar
Digital Library
- Gallager, R. G. 1963. Low-density parity-check codes. MIT Press, Cambridge, MA.Google Scholar
- Gantenbein, D. 2012. A better way to store data. Microsoft Res. Featured Stories. http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx.Google Scholar
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York. Google Scholar
Digital Library
- Gopalan, P., Huang, C., Simitci, H., and Yekhanin, S. 2011. On the locality of codeword symbols. In Proceedings of the Allerton Conference on Communication, Control, and Computing.Google Scholar
- Greenan, K. M., Li, X., and Wylie, J. J. 2010. Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs. In Proceedings of the IEEE Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA. Google Scholar
Digital Library
- Greenan, K. M., Long, D. E., Miller, E. L., Schwarz, T. J. E., and Wylie, J. J. 2008. A spin-up saved is energy earned: Achieving power-efficient, erasure-coded storage. In Proceedings of the USENIX Workshop on Hot Topics in System Dependability. Google Scholar
Digital Library
- Greenan, K. M., Miller, E., and Wylie, J. J. 2008. Reliability of flat XOR-based erasure codes on heterogeneous devices. In Proceedings of the IEEE International Conference on Dependable Systems and Networks.Google Scholar
Digital Library
- Grolimund, D. 2007. P2P online storage. In Proceedings of the Web 2.0 Expo.Google Scholar
- Haeberlen, A., Mislove, A., and Druschel, P. 2005. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation. Google Scholar
Digital Library
- Hafner, J. L. 2005. Weaver codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Hafner, J. L. and Rao, K. 2006. Notes on reliability models for non-MDS erasure codes. IBM Tech. rep. RJ10391.Google Scholar
- Hafner, J. L., Deenadhayalan, V. W., Rao, K., and Tomlin, J. A. 2005. Matrix methods for lost data reconstruction in erasure codes. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Hamilton, J. 2007. An architecture for modular data centers. In Proceedings of the Conference on Innovative Data Systems Research.Google Scholar
- Hosekote, D. K., He, D., and Hafner, J. L. 2007. REO: A generic RAID engine and optimizer. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Huang, C. and Xu, L. 2003. Fast software implementation of finite field operations. Tech. rep., Washington University, St. Louis, MO.Google Scholar
- Huang, C. and Xu, L. 2008. Star: An efficient coding scheme for correcting triple storage node failures. IEEE Trans. Computers. Google Scholar
Digital Library
- Huang, C., Chen, M., and Li, J. 2007. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In Proceedings of the IEEE International Symposium on Network Computing and Applications. IEEE, Los Alamitos, CA.Google Scholar
- Huang, C., Li, J., and Chen, M. 2007. On optimizing XOR-based codes for fault-tolerant storage applications. In Proceedings of the IEEE Information Theory Workshop. IEEE, Los Alamitos, CA.Google Scholar
- Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. 2012. Erasure Coding in Windows Azure Storage. In Proceedings of the USENIX Annual Technical Conference. Google Scholar
Digital Library
- Khan, O., Burns, R., Plank, J., and Huang, C. 2011. In search of I/O-optimal recovery from disk failures. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems. Google Scholar
Digital Library
- Khan, O., Burns, R., Plank, J., Pierce, W., and Huang, C. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. Oceanstore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Luby, M. G., Mitzenmacher, M., Shokrollahi, A., and Spielman, D. A. 2001. Efficient erasure correcting codes. IEEE Trans. Inf. Theory. Google Scholar
Digital Library
- Luo, J., Xu, L., and Plank, J. S. 2009. An efficient XOR-scheduling algorithm for erasure codes encoding. In Proceedings of the IEEE International Conference on Dependable Systems and Networks.Google Scholar
- MacWilliams, F. J. and Sloane, N. J. A. 1977. The Theory of Error Correcting Codes. North-Holland, Amsterdam.Google Scholar
- Maymounkov, P. and Mazieres, D. 2003. Rateless codes and big downloads. In Proceedings of the International Workshop on Peer-To-Peer Systems.Google Scholar
- Papailiopoulos, D. S., Luo, J., Dimakis, A. G., Huang, C., and Li, J. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of the IEEE INFOCOM Mini-Conference. IEEE, Los Alamitos, CA.Google Scholar
- Plank, J. S. 1997. A tutorial on reed-solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exper. Google Scholar
Digital Library
- Plank, J. S. 2008. The RAID-6 liberation codes. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Plank, J. S. and Xu, L. 2006. Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In Proceedings of the IEEE International Symposium on Network Computing and Applications. Google Scholar
Digital Library
- Reed, I. S. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Soc. Industrial Appl. Math.Google Scholar
- Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., and Kubiatowicz, J. 2003. Pond: The oceanstore prototype. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Rowstron, A. and Druschel, P. 2001. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York. Google Scholar
Digital Library
- Saito, Y., Frolund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Schrijver, A. 2003. Combinatorial optimization, polyhedra and efficiency. Alg. Combinatorics.Google Scholar
- Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Shvachko, K., Kuang, H., Radia, S., and Chansler, R. 2010. The hadoop distributed file system. In Proceedings of the IEEE Symposium on Massive Storage Systems and Technologies. IEEE, Los Alamitos, CA. Google Scholar
Digital Library
- Suh, C. and Ramchandran, K. 2011. Exact regeneration codes for distributed storage repair using interference alignment. IEEE Trans. Inf. Theory.Google Scholar
- Tanner, R. M. 1981. A recursive approach to low complexity codes. IEEE Trans. Inf. Theory. Google Scholar
Digital Library
- Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., and Bohra, A. 2010. Hydrafs: A high-throughput file system for the hydrastor content-addressable storage system. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Wang, Z., Tamo, I., and Bruck, J. 2011. On codes for optimal rebuilding access. Tech. rep. ETR111, Caltech.Google Scholar
- Weatherspoon, H. and Kubiatowics, J. 2001. Erasure coding vs. replication: A quantitative comparison. In Proceedings of the International Workshop on Peer-To-Peer Systems. Google Scholar
Digital Library
- Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., and Zhou, B. 2008. Scalable performance of the Panasas parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Wildani, A., Schwarz, T. J. E., Miller, E. L., and Long, D. E. 2009. Protecting against rare event failures in archival systems. In Proceedings of the IEEE International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.Google Scholar
Index Terms
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems
Recommendations
Minimum density RAID-6 codes
RAID-6 codes protect disk array storage systems from two-disk failures. This article presents a complete treatment of a class of RAID-6 codes, called minimum density RAID-6 codes, that have an optimal blend of performance properties. There are two ...
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems
Traditionally, when storage systems employ erasure codes, they are designed to tolerate the failures of entire disks. However, the most common types of failures are latent sector failures, which only affect individual disk sectors, and block failures ...
STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures
Special Issue on Usenix Fast 2014Practical storage systems often adopt erasure codes to tolerate device failures and sector failures, both of which are prevalent in the field. However, traditional erasure codes employ device-level redundancy to protect against sector failures, and ...






Comments