skip to main content
research-article

Systematic Erasure Codes with Optimal Repair Bandwidth and Storage

Authors Info & Claims
Published:28 September 2017Publication History
Skip Abstract Section

Abstract

Erasure codes are widely used in distributed storage systems to prevent data loss. Traditional codes suffer from a typical repair-bandwidth problem in which the amount of data required to reconstruct the lost data, referred to as the repair bandwidth, is often far more than the theoretical minimum. While many novel codes have been proposed in recent years to reduce the repair bandwidth, these codes either require extra storage and computation overhead or are only applicable to some special cases.

To address the weaknesses of the existing solutions to the repair-bandwidth problem, we propose Z Codes, a general family of codes capable of achieving the theoretical lower bound of repair bandwidth versus storage. To the best of our knowledge, the Z codes are the first general systematic erasure codes that jointly achieve optimal repair bandwidth and storage. Further, we generalize the Z codes to the GZ codes to gain the Maximum Distance Separable (MDS) property. Our evaluations of a real system indicate that Z/GZ and Reed-Solomon (RS) codes show approximately close encoding and repairing speeds, while GZ codes achieve over 37.5% response time reduction for repairing the same size of data, compared to the RS and Cauchy Reed-Solomon (CRS) codes.

References

  1. UMass Trace Repository. 2017. Homepage. Retrieved from http://traces.cs.umass.edu/index.php/Main/HomePage.Google ScholarGoogle Scholar
  2. Johannes Blömer, Malik Kalfane, Richard Karp, Marek Karpinski, Michael Luby, and David Zuckerman. 1995. An XOR-based erasure-resilient coding scheme. ICSI Technical Report TR-95-048. http://www.icsi.berkeley.edu/ftp/global/pub/techreports/1995/tr-95-048.Google ScholarGoogle Scholar
  3. Henry C. H. Chen, Yuchong Hu, Patrick P. C. Lee, and Yang Tang. 2014. NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds. IEEE Trans. Comput. 63, 1 (Jan 2014), 31--44. 0018-9340Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexandros G. Dimakis, P. Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Info. Theory 56, 9 (2010), 4539--4551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.Google ScholarGoogle Scholar
  6. Cheng Huang, Huseyin Simitci, Xu Yikang, Aaron Ogus, Brad Calder, Parikshit Gopalan, Li Jin, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the USENIX ATC.Google ScholarGoogle Scholar
  7. Osama Khan, Randal Burns, James Plank, and Cheng Huang. 2011. In search of I/O-optimal recovery from disk failures. In Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems. USENIX Association, 6--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Osama Khan, Randal C. Burns, James S. Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’12).Google ScholarGoogle Scholar
  9. Mingqiang Li and Patrick P. C. Lee. 2014. STAIR codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’14). 147--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015a. General Functional Regenerating Codes with Uncoded Repair for Distributed Storage System. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’15). 372--381. DOI:http://dx.doi.org/10.1109/CCGrid.2015.38 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015b. Z codes: General Systematic Erasure Codes with Optimal Repair Bandwidth and Storage for Distributed Storage Systems. In Proceedings of the 2015 IEEE 34rd International Symposium on Reliable Distributed Systems (SRDS’15). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings IEEE INFOCOM. IEEE, 2801--2805. Google ScholarGoogle ScholarCross RefCross Ref
  13. Sameer Pawar, Nima Noorshams, Salim El Rouayheb, and Kannan Ramchandran. 2011. DRESS codes for the storage cloud: Simple randomized constructions. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT’11). IEEE, 2338--2342.Google ScholarGoogle ScholarCross RefCross Ref
  14. James S. Plank. 2005. T1: erasure codes for storage applications. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’05). 1--74.Google ScholarGoogle Scholar
  15. James S. Plank, Mario Blaum, and James L. Hafner. 2013a. SD codes: Erasure codes designed for how storage systems really fail. In Proceedings of Usenix Conference on File and Storage Technologies (FAST’13).Google ScholarGoogle Scholar
  16. James S. Plank and Ying Ding. 2005. Note: Correction to the 1997 tutorial on Reed--Solomon coding. Software: Pract. Exp. 35, 2 (2005), 189--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013b. Screaming fast Galois Field arithmetic using Intel SIMD instructions. In Proceedings of the 11th Usenix Conference on File and Storage Technologies (FAST’13), San Jose.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James S. Plank and Xu Lihao. 2006. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06). IEEE, 173--180.Google ScholarGoogle Scholar
  19. K. V. Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. 2015. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth. In Proceedings of the Usenix Conference on File and Storage Technologies (FAST’15). 81--94.Google ScholarGoogle Scholar
  20. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. Proceedings of USENIX HotStorage (2013).Google ScholarGoogle Scholar
  21. I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Industr. Appl. Math. 8, 2 (1960), 300--304. Google ScholarGoogle ScholarCross RefCross Ref
  22. Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring elephants: Novel erasure codes for big data. In Proceedings of the Very Large Data Base Endowment (VLDB’13), Vol. 6. VLDB Endowment, 325--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2013. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Trans. Info. Theory 59, 3 (2013), 1597--1616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael R. Tanner. 1981. A recursive approach to low complexity codes. IEEE Trans. Info. Theory 27, 5 (Sep 1981), 533--547. DOI:http://dx.doi.org/10.1109/TIT.1981.1056404 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Systematic Erasure Codes with Optimal Repair Bandwidth and Storage

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 13, Issue 3
            Special Issue on FAST 2017 and Regular Papers
            August 2017
            265 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/3141876
            • Editor:
            • Sam H. Noh
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 28 September 2017
            • Revised: 1 June 2017
            • Accepted: 1 June 2017
            • Received: 1 July 2015
            Published in tos Volume 13, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!