skip to main content
research-article

Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice

Authors Info & Claims
Published:14 November 2017Publication History
Skip Abstract Section

Abstract

Repair performance in hierarchical data centers is often bottlenecked by cross-rack network transfer. Recent theoretical results show that the cross-rack repair traffic can be minimized through repair layering, whose idea is to partition a repair operation into inner-rack and cross-rack layers. However, how repair layering should be implemented and deployed in practice remains an open issue. In this article, we address this issue by proposing a practical repair layering framework called DoubleR. We design two families of practical double regenerating codes (DRC), which not only minimize the cross-rack repair traffic but also have several practical properties that improve state-of-the-art regenerating codes. We implement and deploy DoubleR atop the Hadoop Distributed File System (HDFS) and show that DoubleR maintains the theoretical guarantees of DRC and improves the repair performance of regenerating codes in both node recovery and degraded read operations.

References

  1. GitHub. 2017. Facebookarchive/hadoop-20. Retrieved October 12, 2017, from https://github.com/facebookarchive/hadoop-20.Google ScholarGoogle Scholar
  2. HadoopWiki. 2017. HDFS RAID. Retrieved October 12, 2017, from http://wiki.apache.org/hadoop/HDFS-RAID.Google ScholarGoogle Scholar
  3. GitHub. 2017. ISA-L. Retrieved October 12, 2017, from https://github.com/01org/isa-lGoogle ScholarGoogle Scholar
  4. Marcos K. Aguilera. 2013. Geo-distributed storage in data centers. Slides presented at the International Conference on Principles of Distributed Systems (OPODIS’13).Google ScholarGoogle Scholar
  5. F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. 2014. ShuffleWatcher: Shuffle-aware scheduling in multi-tenant MapReduce clusters. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC’10). 267--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ranjita Bhagwan, Kiran Tati, Yuchung Cheng, Stefan Savage, and Geoffrey M. Voelker. 2004. Total recall: system support for automated availability management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI’04). 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, et al. 2011. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. ACM, New York, NY, 143--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Henry C. H. Chen, Yuchong Hu, Patrick P. C. Lee, and Yang Tang. 2014. NCCloud: A network-coding-based storage system in a cloud-of-clouds. IEEE Transactions on Computers 63, 1, 31--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brian Cho and Marcos K. Aguilera. 2012. Surviving congestion in geo-distributed storage systems. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). 40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the 2013 ACM SIGCOMM Conference (SIGCOMM’13). 231--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Asaf Cidon, Robert Escriva, Sachin Katti, Mendel Rosenblum, and Emin Gün Sirer. 2015. Tiered replication: A cost-effective alternative to full cluster geo-replication. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). 31--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cisco Systems. 2016. Oversubscription and Density Best Practices. Retrieved October 12, 2017, from http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/storage-networking-solution/net_implementation_white_paper0900aecd800f592f.html.Google ScholarGoogle Scholar
  14. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI’04). 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. 2010. Network coding for distributed storage systems. IEEE Transactions on Information Theory 56, 9, 4539--4551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10). 61--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Gaston, J. Pujol, and M. Villanueva. 2013. A realistic distributed storage system that minimizes data storage and repair bandwidth. In Proceedings of the 2013 Data Compression Conference (DCC’13). 491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03), Vol. 37. ACM, New York, NY, 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sreechakra Goparaju, Arman Fazeli, and Alexander Vardy. 2017. Minimum storage regenerating codes for all parameters. IEEE Transactions on Information Theory 63, 10, 6318--6328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kevin M. Greenan, Ethan L. Miller, and Thomas J. E. Schwarz. 2008. Optimizing Galois field arithmetic for diverse processor architectures and applications. In Proceedings of the 2008 IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’08). 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  21. Kevin M. Greenan, James S. Plank, and Jay J. Wylie. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’10). 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 Conference (SIGCOMM’09). 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yuchong Hu, Patrick P. C. Lee, Kenneth W. Shum, and Pan Zhou. 2017. Proxy-assisted regenerating codes with uncoded repair for distributed storage systems. IEEE Transactions on Information Theory PP, 99, 1. Google ScholarGoogle ScholarCross RefCross Ref
  24. Yuchong Hu, Patrick P. C. Lee, and Xiaoyang Zhang. 2016. Double regenerating codes for hierarchical data centers. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT’16). 245--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Virajith Jalaparti, Peter Bodík, Ishai Menache, Sriram Rao, Konstantin Makarychev, and Matthew Caesar. 2015. Network-aware scheduling for data-parallel jobs: Plan when you can. In Proceedings of the 2015 ACM SIGCOMM Conference (SIGCOMM’15). 407--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky. 2008. Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. ACM Transactions on Storage 4, 3, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. O. Khan, R. Burns, J. S. Plank, W. Pierce, and C. Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mingqiang Li, Runhui Li, and Patrick P. C. Lee. 2016. Relieving both storage and recovery burdens in big data clusters with R-STAIR codes. IEEE Internet Computing PP, 99, 1. Google ScholarGoogle ScholarCross RefCross Ref
  30. Runhui Li, Xiaolu Li, Patrick P. C. Lee, and Qun Huang. 2017. Repair pipelining for erasure-coded storage. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 567--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Runhui Li, Jian Lin, and Patrick P. C. Lee. 2015. Enabling concurrent failure recovery for regenerating-coding-based storage systems: From theory to practice. IEEE Transactions on Computers 64, 7, 1898--1911. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Subrata Mitra, Rajesh Panta, Moo-Ryong Ra, and Saurabh Bagchi. 2016. Partial-parallel-repair (PPR): A distributed technique for repairing erasure coded storage. In Proceedings of the 11th European Conference on Computer Systems (EuroSys’16). 30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. f4: Facebook’s warm blob storage system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 383--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The Quantcast File System. Proceedings of the VLDB Endowment 6, 11, 1092--1101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lluis Pamies-Juarez, Filip Blagojević, Robert Mateescu, Cyril Gyuot, Eyal En Gad, and Zvonimir Bandic. 2016. Opening the chrysalis: On the real repair performance of MSR codes. In Proceedings of the 14th Usenix Conference on File and Storage Technologies (FAST’16). 81--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of the 2012 IEEE INFOCOM Conference. 2801--2805. Google ScholarGoogle ScholarCross RefCross Ref
  37. Jaume Pernas, Chau Yuen, Bernat Gastón, and Jaume Pujol. 2013. Non-homogeneous two-rack model for distributed storage systems. In Proceedings of the 2013 IEEE International Symposium on Information Theory (ISIT’13).Google ScholarGoogle ScholarCross RefCross Ref
  38. K. V. Rashmi, P. Nakkiran, J. Wang, N. B. Shah, and K. Ramchandran. 2015. Having your cake and eating it too: Jointly optimal erasure codes for I/O, storage, and network-bandwidth. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 81--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’13). 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM’14). 331--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. V. Rashmi, Nihar B. Shah, and P. Vijay Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8, 5227--5239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics 8, 2, 300--304. Google ScholarGoogle ScholarCross RefCross Ref
  43. Birenjith Sasidharan, Myna Vajha, and P. Vijay Kumar. 2016. An explicit, coupled-layer construction of a high-rate MSR code with low sub-packetization level, small field size and all-node repair. arXiv:1607.07335.Google ScholarGoogle Scholar
  44. Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring elephants: Novel erasure codes for big data. Proceedings of the VLDB Endowment 6, 5, 325--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Bianca Schroeder and Garth A. Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07). 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran. 2012. Distributed storage codes with repair-by-transfer and non-achievability of interior points on the storage-bandwidth tradeoff. IEEE Transactions on Information Theory 58, 3, 1837--1852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran. 2012. Interference alignment in regenerating codes for distributed storage: Necessity and code constructions. IEEE Transactions on Information Theory 58, 4, 2134--2158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zhirong Shen, Jiwu Shu, and Patrick P. C. Lee. 2016. Reconsidering single failure recovery in clustered file systems. In Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’16). 323--334.Google ScholarGoogle Scholar
  49. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mark Silberstein, Lakshmi Ganesh, Yang Wang, Lorenzo Alvisi, and Mike Dahlin. 2014. Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In Proceedings of the 2014 International Conference on Systems and Storage (SYSTOR’14). 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Suh and K. Ramchandran. 2011. Exact-repair MDS code construction using interference alignment. IEEE Transactions on Information Theory 57, 3, 1425--1442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2013. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Transactions on Information Theory 59, 3, 1597--1616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Ali Tebbi, Terence H. Chan, and Chi Wan Sung. 2014. A code design framework for multi-rack distributed storage. In Proceedings of the 2014 IEEE Information Theory Workshop (ITW’14). 55--59.Google ScholarGoogle ScholarCross RefCross Ref
  54. Amin Vahdat, Mohammad Al-Fares, Nathan Farrington, Radhika Niranjan Mysore, George Porter, and Sivasankar Radhakrishnan. 2010. Scale-out networking in the data center. IEEE Micro 30, 4, 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Y. Wu and A. G. Dimakis. 2009. Reducing repair traffic for erasure coding-based storage via interference alignment. In Proceedings of the 2009 IEEE International Symposium on Information Theory (ISIT’09). 2276--2280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. 2015. A tale of two erasure codes in HDFS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 213--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Min Ye and Alexander Barg. 2017. Explicit constructions of high-rate MDS array codes with optimal repair bandwidth. IEEE Transactions on Information Theory 63, 4, 2001--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Min Ye and Alexander Barg. 2017. Explicit constructions of optimal-access MDS codes with nearly optimal sub-packetization. IEEE Transactions on Information Theory 63, 10, 6307--6317. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 13, Issue 4
        Special Issue on MSST 2017 and Regular Papers
        November 2017
        329 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/3160863
        • Editor:
        • Sam H. Noh
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 November 2017
        • Accepted: 1 September 2017
        • Revised: 1 August 2017
        • Received: 1 March 2017
        Published in tos Volume 13, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Author Tags

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!