Abstract
Erasure codes are widely used in distributed storage systems to prevent data loss. Traditional codes suffer from a typical repair-bandwidth problem in which the amount of data required to reconstruct the lost data, referred to as the repair bandwidth, is often far more than the theoretical minimum. While many novel codes have been proposed in recent years to reduce the repair bandwidth, these codes either require extra storage and computation overhead or are only applicable to some special cases.
To address the weaknesses of the existing solutions to the repair-bandwidth problem, we propose Z Codes, a general family of codes capable of achieving the theoretical lower bound of repair bandwidth versus storage. To the best of our knowledge, the Z codes are the first general systematic erasure codes that jointly achieve optimal repair bandwidth and storage. Further, we generalize the Z codes to the GZ codes to gain the Maximum Distance Separable (MDS) property. Our evaluations of a real system indicate that Z/GZ and Reed-Solomon (RS) codes show approximately close encoding and repairing speeds, while GZ codes achieve over 37.5% response time reduction for repairing the same size of data, compared to the RS and Cauchy Reed-Solomon (CRS) codes.
- UMass Trace Repository. 2017. Homepage. Retrieved from http://traces.cs.umass.edu/index.php/Main/HomePage.Google Scholar
- Johannes Blömer, Malik Kalfane, Richard Karp, Marek Karpinski, Michael Luby, and David Zuckerman. 1995. An XOR-based erasure-resilient coding scheme. ICSI Technical Report TR-95-048. http://www.icsi.berkeley.edu/ftp/global/pub/techreports/1995/tr-95-048.Google Scholar
- Henry C. H. Chen, Yuchong Hu, Patrick P. C. Lee, and Yang Tang. 2014. NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds. IEEE Trans. Comput. 63, 1 (Jan 2014), 31--44. 0018-9340Google Scholar
Digital Library
- Alexandros G. Dimakis, P. Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Info. Theory 56, 9 (2010), 4539--4551. Google Scholar
Digital Library
- Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.Google Scholar
- Cheng Huang, Huseyin Simitci, Xu Yikang, Aaron Ogus, Brad Calder, Parikshit Gopalan, Li Jin, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the USENIX ATC.Google Scholar
- Osama Khan, Randal Burns, James Plank, and Cheng Huang. 2011. In search of I/O-optimal recovery from disk failures. In Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems. USENIX Association, 6--6.Google Scholar
Digital Library
- Osama Khan, Randal C. Burns, James S. Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’12).Google Scholar
- Mingqiang Li and Patrick P. C. Lee. 2014. STAIR codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’14). 147--162. Google Scholar
Digital Library
- Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015a. General Functional Regenerating Codes with Uncoded Repair for Distributed Storage System. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’15). 372--381. DOI:http://dx.doi.org/10.1109/CCGrid.2015.38 Google Scholar
Digital Library
- Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015b. Z codes: General Systematic Erasure Codes with Optimal Repair Bandwidth and Storage for Distributed Storage Systems. In Proceedings of the 2015 IEEE 34rd International Symposium on Reliable Distributed Systems (SRDS’15). IEEE.Google Scholar
Digital Library
- Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings IEEE INFOCOM. IEEE, 2801--2805. Google Scholar
Cross Ref
- Sameer Pawar, Nima Noorshams, Salim El Rouayheb, and Kannan Ramchandran. 2011. DRESS codes for the storage cloud: Simple randomized constructions. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT’11). IEEE, 2338--2342.Google Scholar
Cross Ref
- James S. Plank. 2005. T1: erasure codes for storage applications. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’05). 1--74.Google Scholar
- James S. Plank, Mario Blaum, and James L. Hafner. 2013a. SD codes: Erasure codes designed for how storage systems really fail. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’13).Google Scholar
- James S. Plank and Ying Ding. 2005. Note: Correction to the 1997 tutorial on Reed--Solomon coding. Software: Pract. Exp. 35, 2 (2005), 189--194. Google Scholar
Digital Library
- James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013b. Screaming fast Galois Field arithmetic using Intel SIMD instructions. In Proceedings of the 11th Usenix Conference on File and Storage Technologies (FAST’13), San Jose.Google Scholar
Digital Library
- James S. Plank and Xu Lihao. 2006. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06). IEEE, 173--180.Google Scholar
- K. V. Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. 2015. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth. In Proceedings of the Usenix Conference on File and Storage Technologies (FAST’15). 81--94.Google Scholar
- K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. Proceedings of USENIX HotStorage (2013).Google Scholar
- I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Industr. Appl. Math. 8, 2 (1960), 300--304. Google Scholar
Cross Ref
- Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring elephants: Novel erasure codes for big data. In Proceedings of the Very Large Data Base Endowment (VLDB’13), Vol. 6. VLDB Endowment, 325--336.Google Scholar
Digital Library
- Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2013. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Trans. Info. Theory 59, 3 (2013), 1597--1616. Google Scholar
Digital Library
- Michael R. Tanner. 1981. A recursive approach to low complexity codes. IEEE Trans. Info. Theory 27, 5 (Sep 1981), 533--547. DOI:http://dx.doi.org/10.1109/TIT.1981.1056404 Google Scholar
Digital Library
Index Terms
Systematic Erasure Codes with Optimal Repair Bandwidth and Storage
Recommendations
High-Performance General Functional Regenerating Codes with Near-Optimal Repair Bandwidth
Special Issue on MSST 2016 and Regular PapersErasure codes are widely used in modern distributed storage systems to prevent data loss and server failures. Regenerating codes are a class of erasure codes that trade storage efficiency and computation for repair bandwidth reduction. However, their ...
The Auto-configurable LDPC Codes for Distributed Storage
CSE '14: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and EngineeringThe current distributed storage systems mainly rely on data replication to ensure certain level of data availability and reliability. A recent trend is to introduce erasure codes into the distributed storage. Inspired by the RAID system, early attempts ...
Cooperative repair of multiple node failures in distributed storage systems
Cooperative regenerating codes are designed for repairing multiple node failures in distributed storage systems. In contrast to the original repair model of regenerating codes, which are for the repair of single node failure, data exchange among the new ...






Comments