skip to main content
research-article

Repair Pipelining for Erasure-coded Storage: Algorithms and Evaluation

Authors Info & Claims
Published:28 May 2021Publication History
Skip Abstract Section

Abstract

We propose repair pipelining, a technique that speeds up the repair performance in general erasure-coded storage. By carefully scheduling the repair of failed data in small-size units across storage nodes in a pipelined manner, repair pipelining reduces the single-block repair time to approximately the same as the normal read time for a single block in homogeneous environments. We further design different extensions of repair pipelining algorithms for heterogeneous environments and multi-block repair operations. We implement a repair pipelining prototype, called ECPipe, and integrate it as a middleware system into two versions of Hadoop Distributed File System (HDFS) (namely, HDFS-RAID and HDFS-3) as well as Quantcast File System. Experiments on a local testbed and Amazon EC2 show that repair pipelining significantly improves the performance of degraded reads and full-node recovery over existing repair techniques.

References

  1. Facebook. 2020. Facebook’s Hadoop. Retrieved from https://github.com/facebookarchive/hadoop-20.Google ScholarGoogle Scholar
  2. Hadoop. 2020. Hadoop 3.1.1 HDFS. Retrieved from https://hadoop.apache.org/docs/r3.1.1/.Google ScholarGoogle Scholar
  3. Iperf. 2020. Iperf. Retrieved from https://iperf.fr/.Google ScholarGoogle Scholar
  4. Redis. 2020. Redis. Retrieved from http://redis.io/.Google ScholarGoogle Scholar
  5. Linux. 2020. tc. Retrieved from https://linux.die.net/man/8/tc.Google ScholarGoogle Scholar
  6. Marcos K. Aguilera. 2013. Geo-distributed Storage in Data Centers. In Proceedings of the International Conference on Principles of Distributed Systems (OPODIS’13).Google ScholarGoogle Scholar
  7. Faraz Ahmad, Srimat T. Chakradhar, Anand Raghunathan, and T. N. Vijaykumar. 2014. ShuffleWatcher: Shuffle-aware scheduling in multi-tenant mapreduce clusters. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’14). 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fabien André, Anne-Marie Kermarrec, Erwan Le Merrer, Nicolas Le Souarnec, Gilles Straub, and Alexandre van Kempen. 2014. Archiving cold data in warehouses with clustered network coding. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). 1–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yunren Bai, Zihan Xu, Haixia Wang, and Dongsheng Wang. 2019. Fast recovery techniques for erasure-coded clusters in non-uniform traffic network. In Proceedings of the 48th International Conference on Parallel Processing (ICPP’19). 1–10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. Voelker. 2004. Total recall: System support for automated availability management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI’04). 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. 2011. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 143–157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Chen, S. Mu, J. Li, C. Huang, J. Li, A. Ogus, and D. Phillips. 2017. Giza: Erasure coding objects across global data centers. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’17). 539–551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the ACM SIGCOMM Conference (SIGCOMM’13). 231–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. 2006. Efficient replica maintenance for distributed storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design & Implementation (NSDI’06). 45–58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Opearting Systems Design & Implementation (OSDI’04). 137–149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Info. Theory 56, 9 (Sep. 2010), 4539–4551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daniel Ford, Francois Labelle, Florentina I. Popovici, Murray Stokel, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10). 61–74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Ghemawat, H. Gobioff, and S. T. Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). 29–43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. A. R. Hoare. 1961. Algorithm 65: Find. Commun. ACM 4, 7 (1961), 321–322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Holland and Garth A. Gibson. 1992. Parity declustering for continuous operation in redundant disk arrays. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’92). 23–35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hanxu Hou, Patrick P. C. Lee, Kenneth W. Shum, and Yuchong Hu. 2019. Rack-aware regenerating codes for data centers. IEEE Trans. Info. Theory 65, 8 (Aug 2019), 4730–4745.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yuchong Hu, Xiaolu Li, Mi Zhang, Patrick P. C. Lee, Xiaoyang Zhang, Pan Zhou, and Dan Feng. 2017. Optimal repair layering for erasure-coded data centers: From theory to practice. ACM Trans. Storage 13, 4 (2017), 1–24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’12). 15–26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jianzhong Huang, Xianhai Liang, Xiao Qin, Qiang Cao, and Changsheng Xie. 2015. PUSH: A pipelined reconstruction I/O for erasure-coded storage clusters. IEEE Trans. Parallel Distrib. Syst. 26, 2 (2015), 516–526.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, and M. Caesar. 2015. Network-aware scheduling for data-parallel jobs: Plan when you can. In Proceedings of the ACM SIGCOMM Conference (SIGCOMM’15). 407–420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 251–264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jun Li, Shuang Yang, Xin Wang, and Baochun Li. 2010. Tree-structured data regeneration in distributed and storage systems with regenerating codes. In Proceedings of the 29th IEEE Conference on Computer Communications (INFOCOM’10). 2892–2900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Runhui Li, Patrick P. C. Lee, and Yuchong Hu. 2014. Degraded-first scheduling for MapReduce in erasure-coded storage clusters. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’14). 419–430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Runhui Li, Xiaolu Li, Patrick P. C. Lee, and Qun Huang. 2017. Repair pipelining for erasure-coded storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’17). 567–579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xiaolu Li, Runhui Li, Patrick P. C. Lee, and Yuchong Hu. 2019. OpenEC: Toward unified and configurable erasure coding management in distributed storage systems. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 331–344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Subrata Mitra, Rajesh Panta, Moo-Ryong Ra, and Saurabh Bagchi. 2016. Partial-parallel-repair: A distributed technique for repairing erasure-coded storage. In Proceedings of the 11th European Conference on Computer Systems (EuroSys’16). 1–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. f4: Facebook’s warm BLOB storage system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 383–398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lionel M. Ni and Philip K. McKinley. 1993. A survey of wormhole routing techniques in direct networks. IEEE Comput. 26, 2 (1993), 62–76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 29–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The quantcast file system. Proc. VLDB Endow. 6, 11 (2013), 1092–1101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lluis Pamies-Juarez, Filip Blagojević, Robert Mateescu, Cyril Gyuot, Eyal En Gad, and Zvonimir Bandic. 2016. Opening the chrysalis: On the real repair performance of MSR codes. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 81–94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. James S. Plank. 2013. Erasure codes for storage systems: A brief primer. ;login: USENIX Mag. 38, 6 (Dec 2013), 44–50.Google ScholarGoogle Scholar
  38. J. S. Plank, J. Luo, C. D. Schuman, L. Xu, and Z. Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09). 253–265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Prakash, Vitaly Abdrashitov, and Muriel Médard. 2018. The storage versus repair-bandwidth trade-off for clustered storage systems. IEEE Trans. Info. Theory 64, 8 (2018), 5783–5805.Google ScholarGoogle ScholarCross RefCross Ref
  40. K. V. Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. 2015. Having your cake and eating it too: Jointly optimal erasure codes for I/O, storage, and network-bandwidth. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 81–94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’13). 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A “Hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the ACM SIGCOMM Conference (SIGCOMM’14). 331–342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Indust. Appl. Math. 8, 2 (1960), 300–304.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jason K. Resch and James S. Plank. 2011. AONT-RS: Blending security and performance in dispersed storage systems. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST’11). 191–202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. XORing elephants: Novel erasure codes for big data. Proc. VLDB Endow. 6, 5 (2013), 325–336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhirong Shen, Xiaolu Li, and Patrick P. C. Lee. 2019. Fast predictive repair in erasure-coded storage. In Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’19). 556–567.Google ScholarGoogle Scholar
  47. Zhirong Shen, Jiwu Shu, Zhijie Huang, and Yingxun Fu. 2020. ClusterSR: Cluster-aware scattered repair in erasure-coded storage. In Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS’20). 42–51.Google ScholarGoogle Scholar
  48. Zhirong Shen, Jiwu Shu, and Patrick P. C. Lee. 2016. Reconsidering single failure recovery in clustered file systems. In Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’16). 323–334.Google ScholarGoogle Scholar
  49. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop distributed file system. In Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). 1–10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mark Silberstein, Lakshmi Ganesh, Yang Wang, Lorenzo Alvizi, and Mike Dahlin. 2014. Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In Proceedings of International Conference on Systems and Storage (SYSTOR’14). 1–7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, P. Vijay Kumar, Alexandar Barg, Min Ye, Srinivasan Narayanamurthy et al. 2018. Clay codes: Moulding MDS codes to yield an MSR code. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 139–154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hakim Weatherspoon and John D. Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS’02). 328–337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Fangliang Xu, Yijie Wang, Xiaoqiang Pei, and Xingkong Ma. 2019. LAR: Locality-aware reconstruction for erasure-coded distributed storage systems. Concurr. Comput.: Pract. Exper. 31, 11 (2019), e5031.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Repair Pipelining for Erasure-coded Storage: Algorithms and Evaluation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 17, Issue 2
        May 2021
        202 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/3465461
        • Editor:
        • Sam H. Noh
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 May 2021
        • Accepted: 1 November 2020
        • Revised: 1 August 2020
        • Received: 1 July 2019
        Published in tos Volume 17, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!