skip to main content
research-article

Hybrid Codes: Flexible Erasure Codes with Optimized Recovery Performance

Authors Info & Claims
Published:24 September 2020Publication History
Skip Abstract Section

Abstract

Erasure codes are being extensively deployed in practical storage systems to prevent data loss with low redundancy. However, these codes require excessive disk I/Os and network traffic for recovering unavailable data. Among all erasure codes, Minimum Storage Regenerating (MSR) codes can achieve optimal repair bandwidth under the minimum storage during recovery, but some open issues remain to be addressed before applying them in real systems.

Facing with the huge burden during recovery, erasure-coded storage systems need to be developed with high repair efficiency. Aiming at this goal, a new class of coding scheme is introduced—Hybrid Regenerating Codes (Hybrid-RC). The codes utilize the superiority of MSR codes to compute a subset of data blocks while some other parity blocks are used for reliability maintenance. As a result, our design is near-optimal with respect to storage and network traffic and shows great improvements in recovery performance.

References

  1. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In ACM SIGOPS Operating Systems Review, Vol. 37. ACM, 29--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dhruba Borthakur. 2007. The hadoop distributed file system: Architecture and design. Hadoop Project Website 11, 21 (2007).Google ScholarGoogle Scholar
  3. James S. Plank. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exp. 27, 9 (1997), 995--1012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Florence Jessie MacWilliams and Neil James Alexander Sloane. 1977. The Theory of Error Correcting Codes, Part I. Elsevier.Google ScholarGoogle Scholar
  5. 2010. Storage Architecture and Challenges. Retrieved from https://cloud.google.com/files/storage_architecture_and_challenges.pdf. (2010).Google ScholarGoogle Scholar
  6. Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. f4: Facebooktextquoterights warm BLOB storage system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14). USENIX Association, 383--398.Google ScholarGoogle Scholar
  7. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems.Google ScholarGoogle Scholar
  8. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In ACM SIGCOMM Computer Communication Review, Vol. 44. ACM, 331--342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. 2015. A tale of two erasure codes in HDFS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). USENIX Association, 213--226. https://www.usenix.org/conference/fast15/technical-sessions/presentation/xia.Google ScholarGoogle Scholar
  10. Alexandros G. Dimakis, P. Brighten Godfrey, Martin J. Wainwright, and Kannan Ramchandran. 2007. The benefits of network coding for peer-to-peer storage systems. In Proceedings of the 3rd Workshop on Network Coding, Theory, and Applications.Google ScholarGoogle Scholar
  11. A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theory 56, 9 (September 2010), 4539--4551. DOI:http://dx.doi.org/10.1109/TIT.2010.2054295Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lluis Pamies-Juarez, Filip Blagojevic, Robert Mateescu, Cyril Gyuot, Eyal En Gad, and Zvonimir Bandic. 2016. Opening the chrysalis: On the real repair performance of MSR codes. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). 81--94.Google ScholarGoogle Scholar
  13. Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC ’12). 15--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. XORing elephants: Novel erasure codes for big data. Proc. VLDB Endow. 6, 5 (March 2013), 325--336. DOI:http://dx.doi.org/10.14778/2535573.2488339Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST ’13). USENIX Association, Berkeley, CA, 299--306. http://dl.acm.org/citation.cfm?id=2591272.2591303Google ScholarGoogle Scholar
  16. Eyal En Gad, Robert Mateescu, Filip Blagojevic, Cyril Guyot, and Zvonimir Bandic. 2013. Repair-optimal MDS array codes over GF (2). In Proceedings of the 2013 IEEE International Symposium on Information Theory Proceedings (ISIT’13). IEEE, 887--891.Google ScholarGoogle ScholarCross RefCross Ref
  17. Tianli Zhou and Chao Tian. 2019. Fast erasure coding for data storage: A comprehensive study of the acceleration techniques. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19). USENIX Association, 317--329.Google ScholarGoogle Scholar
  18. C. Suh and K. Ramchandran. 2011. Exact-repair MDS code construction using interference alignment. IEEE Trans. Inf. Theory 57, 3 (March 2011), 1425--1442. DOI:http://dx.doi.org/10.1109/TIT.2011.2105003Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar, and Kannan Ramchandran. 2012. Interference alignment in regenerating codes for distributed storage: Necessity and code constructions. IEEE Trans. Inf. Theory 58, 4 (2012), 2134--2158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. V. Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. 2015. Having your cake and eating it too: Jointly optimal erasure codes for i/o, storage, and network-bandwidth. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). 81--94.Google ScholarGoogle Scholar
  21. Cheng Huang, Minghua Chen, and Jin Li. 2013. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. Stor. 9, 1 (2013), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Ye and A. Barg. 2017. Explicit constructions of optimal-access MDS codes with nearly optimal sub-packetization. IEEE Trans. Inf. Theory 63, 10 (October 2017), 6307--6317. DOI:http://dx.doi.org/10.1109/TIT.2017.2730863Google ScholarGoogle Scholar
  23. K. Kralevska, D. Gligoroski, R. E. Jensen, and H. Øverby. 2018. HashTag erasure codes: From theory to practice. IEEE Trans. Big Data 4, 4 (2018), 516--529. DOI:http://dx.doi.org/10.1109/TBDATA.2017.2749255Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Ye and A. Barg. 2017. Explicit constructions of high-rate MDS array codes with optimal repair bandwidth. IEEE Trans. Inf. Theory 63, 4 (April 2017), 2001--2014. DOI:http://dx.doi.org/10.1109/TIT.2017.2661313Google ScholarGoogle Scholar
  25. I. Tamo, Z. Wang, and J. Bruck. 2013. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Trans. Inf. Theory 59, 3 (March 2013), 1597--1616. DOI:http://dx.doi.org/10.1109/TIT.2012.2227110Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, P. Vijay Kumar, Alexandar Barg, Min Ye, Srinivasan Narayanamurthy, Syed Hussain, and Siddhartha Nandi. 2018. Clay codes: Moulding MDS codes to yield an MSR code. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). USENIX Association, 139--154.Google ScholarGoogle Scholar
  27. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (February 1988), 51--81. DOI:http://dx.doi.org/10.1145/35037.35059Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel, and D. C. Steere. 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4 (April 1990), 447--459. DOI:http://dx.doi.org/10.1109/12.54838Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daniel Siewiorek and Robert Swarz. 2014. Reliable Computer Systems: Design and Evaluatuion. Digital Press.Google ScholarGoogle Scholar
  30. Qin Xin, E. L. Miller, T. Schwarz, D. D. E. Long, S. A. Brandt, and W. Litwin. 2003. Reliability mechanisms for very large storage systems. In Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies 2003 (MSST ’03). 146--156. DOI:http://dx.doi.org/10.1109/MASS.2003.1194851Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Amer, J. F. Paris, T. Schwarz, V. Ciotola, and J. Larkby-Lahet. 2007. Outshining mirrors: MTTDL of fixed-order spiral layouts. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os 2007 (SNAPI ’07).11--16. DOI:http://dx.doi.org/10.1109/SNAPI.2007.20Google ScholarGoogle Scholar
  32. Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX.Google ScholarGoogle Scholar
  33. James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’09), Vol. 9. 253--265.Google ScholarGoogle Scholar
  34. J. Luo, M. Shrestha, L. Xu, and J. S. Plank. 2014. Efficient encoding schedules for XOR-based erasure codes. IEEE Trans. Comput. 63, 9 (September 2014), 2259--2272. DOI:http://dx.doi.org/10.1109/TC.2013.23Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Storage—UMass Trace Repository. 2018. Retrieved from http://traces.cs.umass.edu/index.php/Storage/Storage.Google ScholarGoogle Scholar
  36. Storage Networking Industry Association: IOTTA Repository Home. 2018. Retrieved from http://iotta.snia.org/traces/158.Google ScholarGoogle Scholar
  37. Osama Khan, Randal C. Burns, James S. Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’12). 20.Google ScholarGoogle Scholar
  38. Qing Liu, Dan Feng, Hong Jiangy, Yuchong Hu, and Tianfeng Jiao. 2015. Z codes: General systematic erasure codes with optimal repair bandwidth and storage for distributed storage systems. In Proceedings of the 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS’15). IEEE, 212--217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nihar B. Shah, K. Vinayak Rashmi, P. Vijay Kumar, and Kannan Ramchandran. 2010. Explicit codes uniformly reducing repair bandwidth in distributed storage. In Proceedings of the 2010 National Conference on Communications (NCC ’10). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  40. Korlakai Vinayak Rashmi, Nihar B. Shah, and P. Vijay Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans. Inf. Theory 57, 8 (2011), 5227--5239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang. 2014. NCCloud: A network-coding-based storage system in a cloud-of-clouds. IEEE Trans. Comput. 63, 1 (January 2014), 31--44. DOI:http://dx.doi.org/10.1109/TC.2013.167Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Z. Wang, I. Tamo, and J. Bruck. 2016. Explicit minimum storage regenerating codes. IEEE Trans. Inf. Theory 62, 8 (August 2016), 4466--4480. DOI:http://dx.doi.org/10.1109/TIT.2016.2553675Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Qing Liu, Dan Feng, Hong Jiang, Yuchong Hu, and Tianfeng Jiao. 2017. Systematic erasure codes with optimal repair bandwidth and storage. ACM Trans. Stor. 13, 3 (2017), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. I. Tamo, Z. Wang, and J. Bruck. 2014. Access versus bandwidth in codes for storage. IEEE Trans. Inf. Theory 60, 4 (April 2014), 2028--2037. DOI:http://dx.doi.org/10.1109/TIT.2014.2305698Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hybrid Codes: Flexible Erasure Codes with Optimized Recovery Performance

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 16, Issue 4
            Special Section on Computational Storage and Regular Papers
            November 2020
            185 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/3426401
            • Editor:
            • Sam H. Noh
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 September 2020
            • Accepted: 1 June 2020
            • Revised: 1 April 2020
            • Received: 1 November 2018
            Published in tos Volume 16, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Author Tags

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!