Abstract
Erasure codes are widely used in modern distributed storage systems to prevent data loss and server failures. Regenerating codes are a class of erasure codes that trade storage efficiency and computation for repair bandwidth reduction. However, their nonunified coding parameters and huge computational overhead prohibit their applications. Hence, we first propose a family of General Functional Regenerating (GFR) codes with uncoded repair, balancing storage efficiency and repair bandwidth with general parameters. The GFR codes take advantage of a heuristic repair algorithm, which makes efforts to employ as little repair bandwidth as possible to repair a single failure. Second, we also present a scheduled shift multiplication (SSM) algorithm, which accelerates the matrix product over the Galois field by scheduling the order of coding operations, so encoding and repairing of GFR codes can be executed by fast bitwise shifting and exclusive-OR. Compared to the traditional table-lookup multiplication algorithm, our SSM algorithm gains 1.2 to 2 X speedup in our experimental evaluations, with little effect on the repair success rate.
- H. P. Anvin. 2015. The mathematics of RAID-6. (2015). https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf.Google Scholar
- Walter Burkhard and Jai Menon. 1993. Disk array storage system reliability. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23). IEEE, Los Alamitos, CA, 432--441.Google Scholar
Cross Ref
- H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang. 2014. NCCloud: A network-coding-based storage system in a cloud-of-clouds. IEEE Transactions on Computers 63, 1, 31--44.Google Scholar
Digital Library
- Daniel Cullina, Alexandros G. Dimakis, and Tracey Ho. 2009. Searching for minimum storage regenerating codes. arXiv:0910.2245.Google Scholar
- Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Transactions on Information Theory 56, 9, 4539--4551.Google Scholar
Digital Library
- Alexandros G. Dimakis, Kannan Ramchandran, Yunnan Wu, and Changho Suh. 2011. A survey on network codes for distributed storage. Proceedings of the IEEE 99, 3, 476--489.Google Scholar
Cross Ref
- K. M. Greenan, E. L. Miller, and T. J. E. Schwarz. 2008. Optimizing Galois field arithmetic for diverse processor architectures and applications. In Proceedings of the 2008 IEEE International Symposium on Modeling, Analysis, and Simulation of Computers and Telecommunications Systems (MASCOTS’08). IEEE, Los Alamitos, CA, 1--10.Google Scholar
Cross Ref
- Kevin M. Greenan, James S. Plank, and Jay J. Wylie. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage and File Systems. 5.Google Scholar
- Yuchong Hu, Henry C. H. Chen, Patrick P. C. Lee, and Yang Tang. 2012. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 21.Google Scholar
Digital Library
- Yuchong Hu, Chiu-Man Yu, Yan Kit Li, Patrick P. C. Lee, and John C. S. Lui. 2011. NCFS: On the practicality and extensibility of a network-coding-based distributed file system. In Proceedings of the 2011 International Symposium on Network Coding (NetCod’11). IEEE, Los Alamitos, CA, 1--6.Google Scholar
- C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’12).Google Scholar
Digital Library
- Intel VTune Amplifier XE 2015. 2015. What’s new? - Intel®VTuneTM Amplifier XE 2015. (2015). https://software.intel.com/en-us/articles/whats-new-intel-vtune-amplifier-xe-2015.Google Scholar
- Sebastian Kalcher and Volker Lindenstruth. 2011. Accelerating Galois field arithmetic for Reed-Solomon erasure codes in storage applications. In Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER’11). IEEE, Los Alamitos, CA, 290--298.Google Scholar
Digital Library
- Osama Khan, Randal Burns, James Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12).Google Scholar
Digital Library
- John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishan Gummadi, et al. 2000. OceanStore: An architecture for global-scale persistent storage. ACM SIGPLAN Notices 35, 11, 190--201.Google Scholar
Digital Library
- Mingqiang Li and Patrick P. C. Lee. 2014. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 147--162.Google Scholar
Digital Library
- Runhui Li, Jian Lin, and Patrick P. C. Lee. 2013. CORE: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. arXiv:1302.3344.Google Scholar
- Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015. General functional regenerating codes with uncoded repair for distributed storage system. In Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’15). IEEE, Los Alamitos, CA, 372--381.Google Scholar
Digital Library
- Jianqiang Luo, Kevin D. Bowers, Alina Oprea, and Lihao Xu. 2012. Efficient software implementations of large finite fields GF (2 n) for secure storage applications. ACM Transactions on Storage 8, 1, 2.Google Scholar
Digital Library
- Jianqiang Luo, James S. Plank, Mochan Shrestha, and Lihao Xu. 2013. Efficient encoding schedules for XOR-based erasure codes. IEEE Transactions on Computers 63, 9, 2259--2272.Google Scholar
Digital Library
- Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of IEEE INFOCOM (INFOCOM’12). IEEE, Los Alamitos, CA, 2801--2805.Google Scholar
Cross Ref
- James S. Plank. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software—Practice and Experience 27, 9, 995--1012.Google Scholar
- James S. Plank. 2009. The RAID-6 liber8tion code. International Journal of High Performance Computing Applications 23, 3, 242--251.Google Scholar
Digital Library
- James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13).Google Scholar
- James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th Conference on File and Storage Technologies (FAST’09). 253--265.Google Scholar
Digital Library
- James S. Plank and Lihao Xu. 2006. Optimizing cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06). IEEE, Los Alamitos, CA, 173--180.Google Scholar
- K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. 2014. A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM’14). ACM, New York, NY, 331--342.Google Scholar
- K. V. Rashmi, N. B. Shah, and P. V. Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8, 5227--5239.Google Scholar
Digital Library
- K. V. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran. 2009. Explicit construction of optimal exact regenerating codes for distributed storage. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton’09). IEEE, Los Alamitos, CA, 1243--1249.Google Scholar
- I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics 8, 2, 300--304.Google Scholar
Cross Ref
- Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring elephants: Novel erasure codes for big data. In Proceedings of the 39th International Conference on Very Large Data Bases. 325--336.Google Scholar
Digital Library
- Bianca Schroeder and Garth A. Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07). 1--16.Google Scholar
Digital Library
- N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran. 2012. Distributed storage codes with repair-by-transfer and nonachievability of interior points on the storage-bandwidth tradeoff. IEEE Transactions on Information Theory 58, 3, 1837--1852.Google Scholar
Digital Library
- Kenneth W. Shum and Yuchong Hu. 2012. Functional-repair-by-transfer regenerating codes. In Proceedings of the 2012 IEEE International Symposium on Information Theory (ISIT’12). IEEE, Los Alamitos, CA, 1192--1196.Google Scholar
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, Los Alamitos, CA, 1--10.Google Scholar
- Changho Suh and Kannan Ramchandran. 2011. Exact-repair MDS code construction using interference alignment. IEEE Transactions on Information Theory 57, 3, 1425--1442.Google Scholar
Digital Library
- Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2011. Zigzag codes: MDS array codes with optimal rebuilding. arXiv:1112.0371.Google Scholar
- Anyu Wang and Zhifang Zhang. 2012. Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage. arXiv:1207.0879.Google Scholar
- Zhiying Wang, Itzhak Tamo, and Jehoshua Bruck. 2011. On codes for optimal rebuilding access. In Proceedings of the 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton’11). IEEE, Los Alamitos, CA, 1374--1381.Google Scholar
Cross Ref
- Yunfeng Zhu, Patrick P. C. Lee, Yuchong Hu, Liping Xiang, and Yinlong Xu. 2012. On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice. In Proceedings of the 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). IEEE, Los Alamitos, CA, 1--12.Google Scholar
Cross Ref
Index Terms
High-Performance General Functional Regenerating Codes with Near-Optimal Repair Bandwidth
Recommendations
Systematic Erasure Codes with Optimal Repair Bandwidth and Storage
Special Issue on FAST 2017 and Regular PapersErasure codes are widely used in distributed storage systems to prevent data loss. Traditional codes suffer from a typical repair-bandwidth problem in which the amount of data required to reconstruct the lost data, referred to as the repair bandwidth, ...
General functional regenerating codes with uncoded repair for distributed storage system
CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingErasure codes are widely used in modern distributed storage systems to prevent data loss and server failures. Regenerating codes are a class of erasure codes that trades storage efficiency and computation for repair bandwidth reduction. However, their ...
Regenerating codes for distributed storage networks
WAIFI'10: Proceedings of the Third international conference on Arithmetic of finite fieldsIn a storage system where individual storage nodes are prone to failure, the redundant storage of data in a distributed manner across multiple nodes is a must to ensure reliability. Reed-Solomon codes possess the reconstruction property under which the ...






Comments