Abstract
Erasure codes are being extensively deployed in practical storage systems to prevent data loss with low redundancy. However, these codes require excessive disk I/Os and network traffic for recovering unavailable data. Among all erasure codes, Minimum Storage Regenerating (MSR) codes can achieve optimal repair bandwidth under the minimum storage during recovery, but some open issues remain to be addressed before applying them in real systems.
Facing with the huge burden during recovery, erasure-coded storage systems need to be developed with high repair efficiency. Aiming at this goal, a new class of coding scheme is introduced—Hybrid Regenerating Codes (Hybrid-RC). The codes utilize the superiority of MSR codes to compute a subset of data blocks while some other parity blocks are used for reliability maintenance. As a result, our design is near-optimal with respect to storage and network traffic and shows great improvements in recovery performance.
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In ACM SIGOPS Operating Systems Review, Vol. 37. ACM, 29--43.Google Scholar
Digital Library
- Dhruba Borthakur. 2007. The hadoop distributed file system: Architecture and design. Hadoop Project Website 11, 21 (2007).Google Scholar
- James S. Plank. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exp. 27, 9 (1997), 995--1012.Google Scholar
Digital Library
- Florence Jessie MacWilliams and Neil James Alexander Sloane. 1977. The Theory of Error Correcting Codes, Part I. Elsevier.Google Scholar
- 2010. Storage Architecture and Challenges. Retrieved from https://cloud.google.com/files/storage_architecture_and_challenges.pdf. (2010).Google Scholar
- Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. f4: Facebooktextquoterights warm BLOB storage system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14). USENIX Association, 383--398.Google Scholar
- K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems.Google Scholar
- K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In ACM SIGCOMM Computer Communication Review, Vol. 44. ACM, 331--342.Google Scholar
Digital Library
- Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. 2015. A tale of two erasure codes in HDFS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). USENIX Association, 213--226. https://www.usenix.org/conference/fast15/technical-sessions/presentation/xia.Google Scholar
- Alexandros G. Dimakis, P. Brighten Godfrey, Martin J. Wainwright, and Kannan Ramchandran. 2007. The benefits of network coding for peer-to-peer storage systems. In Proceedings of the 3rd Workshop on Network Coding, Theory, and Applications.Google Scholar
- A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theory 56, 9 (September 2010), 4539--4551. DOI:http://dx.doi.org/10.1109/TIT.2010.2054295Google Scholar
Digital Library
- Lluis Pamies-Juarez, Filip Blagojevic, Robert Mateescu, Cyril Gyuot, Eyal En Gad, and Zvonimir Bandic. 2016. Opening the chrysalis: On the real repair performance of MSR codes. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). 81--94.Google Scholar
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC ’12). 15--26.Google Scholar
Digital Library
- Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. XORing elephants: Novel erasure codes for big data. Proc. VLDB Endow. 6, 5 (March 2013), 325--336. DOI:http://dx.doi.org/10.14778/2535573.2488339Google Scholar
Digital Library
- James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST ’13). USENIX Association, Berkeley, CA, 299--306. http://dl.acm.org/citation.cfm?id=2591272.2591303Google Scholar
- Eyal En Gad, Robert Mateescu, Filip Blagojevic, Cyril Guyot, and Zvonimir Bandic. 2013. Repair-optimal MDS array codes over GF (2). In Proceedings of the 2013 IEEE International Symposium on Information Theory Proceedings (ISIT’13). IEEE, 887--891.Google Scholar
Cross Ref
- Tianli Zhou and Chao Tian. 2019. Fast erasure coding for data storage: A comprehensive study of the acceleration techniques. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19). USENIX Association, 317--329.Google Scholar
- C. Suh and K. Ramchandran. 2011. Exact-repair MDS code construction using interference alignment. IEEE Trans. Inf. Theory 57, 3 (March 2011), 1425--1442. DOI:http://dx.doi.org/10.1109/TIT.2011.2105003Google Scholar
Digital Library
- Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar, and Kannan Ramchandran. 2012. Interference alignment in regenerating codes for distributed storage: Necessity and code constructions. IEEE Trans. Inf. Theory 58, 4 (2012), 2134--2158.Google Scholar
Digital Library
- K. V. Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. 2015. Having your cake and eating it too: Jointly optimal erasure codes for i/o, storage, and network-bandwidth. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). 81--94.Google Scholar
- Cheng Huang, Minghua Chen, and Jin Li. 2013. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. Stor. 9, 1 (2013), 3.Google Scholar
Digital Library
- M. Ye and A. Barg. 2017. Explicit constructions of optimal-access MDS codes with nearly optimal sub-packetization. IEEE Trans. Inf. Theory 63, 10 (October 2017), 6307--6317. DOI:http://dx.doi.org/10.1109/TIT.2017.2730863Google Scholar
- K. Kralevska, D. Gligoroski, R. E. Jensen, and H. Øverby. 2018. HashTag erasure codes: From theory to practice. IEEE Trans. Big Data 4, 4 (2018), 516--529. DOI:http://dx.doi.org/10.1109/TBDATA.2017.2749255Google Scholar
Cross Ref
- M. Ye and A. Barg. 2017. Explicit constructions of high-rate MDS array codes with optimal repair bandwidth. IEEE Trans. Inf. Theory 63, 4 (April 2017), 2001--2014. DOI:http://dx.doi.org/10.1109/TIT.2017.2661313Google Scholar
- I. Tamo, Z. Wang, and J. Bruck. 2013. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Trans. Inf. Theory 59, 3 (March 2013), 1597--1616. DOI:http://dx.doi.org/10.1109/TIT.2012.2227110Google Scholar
Digital Library
- Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, P. Vijay Kumar, Alexandar Barg, Min Ye, Srinivasan Narayanamurthy, Syed Hussain, and Siddhartha Nandi. 2018. Clay codes: Moulding MDS codes to yield an MSR code. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). USENIX Association, 139--154.Google Scholar
- John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (February 1988), 51--81. DOI:http://dx.doi.org/10.1145/35037.35059Google Scholar
Digital Library
- M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel, and D. C. Steere. 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4 (April 1990), 447--459. DOI:http://dx.doi.org/10.1109/12.54838Google Scholar
Digital Library
- Daniel Siewiorek and Robert Swarz. 2014. Reliable Computer Systems: Design and Evaluatuion. Digital Press.Google Scholar
- Qin Xin, E. L. Miller, T. Schwarz, D. D. E. Long, S. A. Brandt, and W. Litwin. 2003. Reliability mechanisms for very large storage systems. In Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies 2003 (MSST ’03). 146--156. DOI:http://dx.doi.org/10.1109/MASS.2003.1194851Google Scholar
Cross Ref
- A. Amer, J. F. Paris, T. Schwarz, V. Ciotola, and J. Larkby-Lahet. 2007. Outshining mirrors: MTTDL of fixed-order spiral layouts. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os 2007 (SNAPI ’07).11--16. DOI:http://dx.doi.org/10.1109/SNAPI.2007.20Google Scholar
- Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX.Google Scholar
- James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’09), Vol. 9. 253--265.Google Scholar
- J. Luo, M. Shrestha, L. Xu, and J. S. Plank. 2014. Efficient encoding schedules for XOR-based erasure codes. IEEE Trans. Comput. 63, 9 (September 2014), 2259--2272. DOI:http://dx.doi.org/10.1109/TC.2013.23Google Scholar
Digital Library
- Storage—UMass Trace Repository. 2018. Retrieved from http://traces.cs.umass.edu/index.php/Storage/Storage.Google Scholar
- Storage Networking Industry Association: IOTTA Repository Home. 2018. Retrieved from http://iotta.snia.org/traces/158.Google Scholar
- Osama Khan, Randal C. Burns, James S. Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’12). 20.Google Scholar
- Qing Liu, Dan Feng, Hong Jiangy, Yuchong Hu, and Tianfeng Jiao. 2015. Z codes: General systematic erasure codes with optimal repair bandwidth and storage for distributed storage systems. In Proceedings of the 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS’15). IEEE, 212--217.Google Scholar
Digital Library
- Nihar B. Shah, K. Vinayak Rashmi, P. Vijay Kumar, and Kannan Ramchandran. 2010. Explicit codes uniformly reducing repair bandwidth in distributed storage. In Proceedings of the 2010 National Conference on Communications (NCC ’10). IEEE, 1--5.Google Scholar
Cross Ref
- Korlakai Vinayak Rashmi, Nihar B. Shah, and P. Vijay Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans. Inf. Theory 57, 8 (2011), 5227--5239.Google Scholar
Digital Library
- H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang. 2014. NCCloud: A network-coding-based storage system in a cloud-of-clouds. IEEE Trans. Comput. 63, 1 (January 2014), 31--44. DOI:http://dx.doi.org/10.1109/TC.2013.167Google Scholar
Digital Library
- Z. Wang, I. Tamo, and J. Bruck. 2016. Explicit minimum storage regenerating codes. IEEE Trans. Inf. Theory 62, 8 (August 2016), 4466--4480. DOI:http://dx.doi.org/10.1109/TIT.2016.2553675Google Scholar
Digital Library
- Qing Liu, Dan Feng, Hong Jiang, Yuchong Hu, and Tianfeng Jiao. 2017. Systematic erasure codes with optimal repair bandwidth and storage. ACM Trans. Stor. 13, 3 (2017), 1--27.Google Scholar
Digital Library
- I. Tamo, Z. Wang, and J. Bruck. 2014. Access versus bandwidth in codes for storage. IEEE Trans. Inf. Theory 60, 4 (April 2014), 2028--2037. DOI:http://dx.doi.org/10.1109/TIT.2014.2305698Google Scholar
Digital Library
Index Terms
Hybrid Codes: Flexible Erasure Codes with Optimized Recovery Performance
Recommendations
Disaster recovery codes: increasing reliability with large-stripe erasure correcting codes
StorageSS '07: Proceedings of the 2007 ACM workshop on Storage security and survivabilityLarge-scale storage systems need to provide the right amount of redundancy in their storage scheme to protect client data. In particular, many high-performance systems require data protection that imposes minimal impact on performance; thus, such ...
Nonbinary hybrid LDPC codes
In this paper, a new class oflow-density parity-check (LDPC) codes, named hybrid LDPC codes, is introduced. Hybrid LDPC codes are characterized by an irregular connectivity profile and heterogeneous orders of the symbols in the codeword. It is shown in ...
Application of Regenerating Codes for Fault Tolerance in Distributed Storage Systems
NCA '12: Proceedings of the 2012 IEEE 11th International Symposium on Network Computing and ApplicationsRecently, regenerating codes, a special network coding technique, were discovered for fault-tolerant storage systems with the promising advantage of efficient data recovery in the case of a single node failure and replacement (regeneration case). From ...






Comments