skip to main content
research-article

High-Performance General Functional Regenerating Codes with Near-Optimal Repair Bandwidth

Authors Info & Claims
Published:10 June 2017Publication History
Skip Abstract Section

Abstract

Erasure codes are widely used in modern distributed storage systems to prevent data loss and server failures. Regenerating codes are a class of erasure codes that trade storage efficiency and computation for repair bandwidth reduction. However, their nonunified coding parameters and huge computational overhead prohibit their applications. Hence, we first propose a family of General Functional Regenerating (GFR) codes with uncoded repair, balancing storage efficiency and repair bandwidth with general parameters. The GFR codes take advantage of a heuristic repair algorithm, which makes efforts to employ as little repair bandwidth as possible to repair a single failure. Second, we also present a scheduled shift multiplication (SSM) algorithm, which accelerates the matrix product over the Galois field by scheduling the order of coding operations, so encoding and repairing of GFR codes can be executed by fast bitwise shifting and exclusive-OR. Compared to the traditional table-lookup multiplication algorithm, our SSM algorithm gains 1.2 to 2 X speedup in our experimental evaluations, with little effect on the repair success rate.

References

  1. H. P. Anvin. 2015. The mathematics of RAID-6. (2015). https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf.Google ScholarGoogle Scholar
  2. Walter Burkhard and Jai Menon. 1993. Disk array storage system reliability. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23). IEEE, Los Alamitos, CA, 432--441.Google ScholarGoogle ScholarCross RefCross Ref
  3. H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang. 2014. NCCloud: A network-coding-based storage system in a cloud-of-clouds. IEEE Transactions on Computers 63, 1, 31--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daniel Cullina, Alexandros G. Dimakis, and Tracey Ho. 2009. Searching for minimum storage regenerating codes. arXiv:0910.2245.Google ScholarGoogle Scholar
  5. Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Transactions on Information Theory 56, 9, 4539--4551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexandros G. Dimakis, Kannan Ramchandran, Yunnan Wu, and Changho Suh. 2011. A survey on network codes for distributed storage. Proceedings of the IEEE 99, 3, 476--489.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. M. Greenan, E. L. Miller, and T. J. E. Schwarz. 2008. Optimizing Galois field arithmetic for diverse processor architectures and applications. In Proceedings of the 2008 IEEE International Symposium on Modeling, Analysis, and Simulation of Computers and Telecommunications Systems (MASCOTS’08). IEEE, Los Alamitos, CA, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kevin M. Greenan, James S. Plank, and Jay J. Wylie. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage and File Systems. 5.Google ScholarGoogle Scholar
  9. Yuchong Hu, Henry C. H. Chen, Patrick P. C. Lee, and Yang Tang. 2012. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yuchong Hu, Chiu-Man Yu, Yan Kit Li, Patrick P. C. Lee, and John C. S. Lui. 2011. NCFS: On the practicality and extensibility of a network-coding-based distributed file system. In Proceedings of the 2011 International Symposium on Network Coding (NetCod’11). IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle Scholar
  11. C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel VTune Amplifier XE 2015. 2015. What’s new? - Intel®VTuneTM Amplifier XE 2015. (2015). https://software.intel.com/en-us/articles/whats-new-intel-vtune-amplifier-xe-2015.Google ScholarGoogle Scholar
  13. Sebastian Kalcher and Volker Lindenstruth. 2011. Accelerating Galois field arithmetic for Reed-Solomon erasure codes in storage applications. In Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER’11). IEEE, Los Alamitos, CA, 290--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Osama Khan, Randal Burns, James Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishan Gummadi, et al. 2000. OceanStore: An architecture for global-scale persistent storage. ACM SIGPLAN Notices 35, 11, 190--201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mingqiang Li and Patrick P. C. Lee. 2014. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 147--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Runhui Li, Jian Lin, and Patrick P. C. Lee. 2013. CORE: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. arXiv:1302.3344.Google ScholarGoogle Scholar
  18. Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015. General functional regenerating codes with uncoded repair for distributed storage system. In Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’15). IEEE, Los Alamitos, CA, 372--381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jianqiang Luo, Kevin D. Bowers, Alina Oprea, and Lihao Xu. 2012. Efficient software implementations of large finite fields GF (2 n) for secure storage applications. ACM Transactions on Storage 8, 1, 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jianqiang Luo, James S. Plank, Mochan Shrestha, and Lihao Xu. 2013. Efficient encoding schedules for XOR-based erasure codes. IEEE Transactions on Computers 63, 9, 2259--2272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of IEEE INFOCOM (INFOCOM’12). IEEE, Los Alamitos, CA, 2801--2805.Google ScholarGoogle ScholarCross RefCross Ref
  22. James S. Plank. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software—Practice and Experience 27, 9, 995--1012.Google ScholarGoogle Scholar
  23. James S. Plank. 2009. The RAID-6 liber8tion code. International Journal of High Performance Computing Applications 23, 3, 242--251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13).Google ScholarGoogle Scholar
  25. James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th Conference on File and Storage Technologies (FAST’09). 253--265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. James S. Plank and Lihao Xu. 2006. Optimizing cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06). IEEE, Los Alamitos, CA, 173--180.Google ScholarGoogle Scholar
  27. K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. 2014. A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM’14). ACM, New York, NY, 331--342.Google ScholarGoogle Scholar
  28. K. V. Rashmi, N. B. Shah, and P. V. Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8, 5227--5239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. V. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran. 2009. Explicit construction of optimal exact regenerating codes for distributed storage. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton’09). IEEE, Los Alamitos, CA, 1243--1249.Google ScholarGoogle Scholar
  30. I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics 8, 2, 300--304.Google ScholarGoogle ScholarCross RefCross Ref
  31. Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring elephants: Novel erasure codes for big data. In Proceedings of the 39th International Conference on Very Large Data Bases. 325--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bianca Schroeder and Garth A. Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07). 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran. 2012. Distributed storage codes with repair-by-transfer and nonachievability of interior points on the storage-bandwidth tradeoff. IEEE Transactions on Information Theory 58, 3, 1837--1852.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kenneth W. Shum and Yuchong Hu. 2012. Functional-repair-by-transfer regenerating codes. In Proceedings of the 2012 IEEE International Symposium on Information Theory (ISIT’12). IEEE, Los Alamitos, CA, 1192--1196.Google ScholarGoogle Scholar
  35. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, Los Alamitos, CA, 1--10.Google ScholarGoogle Scholar
  36. Changho Suh and Kannan Ramchandran. 2011. Exact-repair MDS code construction using interference alignment. IEEE Transactions on Information Theory 57, 3, 1425--1442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2011. Zigzag codes: MDS array codes with optimal rebuilding. arXiv:1112.0371.Google ScholarGoogle Scholar
  38. Anyu Wang and Zhifang Zhang. 2012. Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage. arXiv:1207.0879.Google ScholarGoogle Scholar
  39. Zhiying Wang, Itzhak Tamo, and Jehoshua Bruck. 2011. On codes for optimal rebuilding access. In Proceedings of the 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton’11). IEEE, Los Alamitos, CA, 1374--1381.Google ScholarGoogle ScholarCross RefCross Ref
  40. Yunfeng Zhu, Patrick P. C. Lee, Yuchong Hu, Liping Xiang, and Yinlong Xu. 2012. On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice. In Proceedings of the 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). IEEE, Los Alamitos, CA, 1--12.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. High-Performance General Functional Regenerating Codes with Near-Optimal Repair Bandwidth

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Storage
              ACM Transactions on Storage  Volume 13, Issue 2
              Special Issue on MSST 2016 and Regular Papers
              May 2017
              199 pages
              ISSN:1553-3077
              EISSN:1553-3093
              DOI:10.1145/3098275
              • Editor:
              • Sam H. Noh
              Issue’s Table of Contents

              Copyright © 2017 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 10 June 2017
              • Accepted: 1 February 2017
              • Revised: 1 December 2016
              • Received: 1 September 2015
              Published in tos Volume 13, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!