skip to main content
research-article

Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

Published:01 March 2013Publication History
Skip Abstract Section

Abstract

We design flexible schemes to explore the tradeoffs between storage space and access efficiency in reliable data storage systems. Aiming at this goal, two new classes of erasure-resilient codes are introduced -- Basic Pyramid Codes (BPC) and Generalized Pyramid Codes (GPC). Both schemes require slightly more storage space than conventional schemes, but significantly improve the critical performance of read during failures and unavailability.

As a by-product, we establish a necessary matching condition to characterize the limit of failure recovery, that is, unless the matching condition is satisfied, a failure case is impossible to recover. In addition, we define a maximally recoverable (MR) property. For all ERC schemes holding the MR property, the matching condition becomes sufficient, that is, all failure cases satisfying the matching condition are indeed recoverable. We show that GPC is the first class of non-MDS schemes holding the MR property.

References

  1. Abd-El-Malek, M., W. V. C. Ii, Cranor, C., Ganger, G. R., Hendricks, J., Klosterman, A. J., Mesnier, M., Prasad, M., Salmon, B., Sambasivan, R. R., Sinnamohideen, S., Strunk, J. D., Thereska, E., Wachs, M., and Wylie, J. J. 2005. Ursa minor: Versatile cluster-based storage. In Proceedings of USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aguilera, M. K., Janakiraman, R., and Xu, L. 2005. Using erasure codes for storage in a distributed system. In Proceedings of IEEE International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blahut, R. E. 2003. Algebraic Codes for Data Transmission. Cambridge University Press.Google ScholarGoogle Scholar
  4. Blaum, M. and Roth, R. M. 1999. On lowest-density MDS codes. IEEE Trans. Inf. Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans.Computers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., and Zuckerman, D. 1995. An XOR-based erasure-resilient coding scheme. Tech. rep. TR-95-048, ICSI, Berkeley, CA.Google ScholarGoogle Scholar
  7. Borthakur, D., Schmidt, R., Vadali, R., Chen, S., and Kling, P. 2010. HDFS RAID. Hadoop User Group Meeting.Google ScholarGoogle Scholar
  8. Cadambe, V. R., Huang, C., and Li, J. 2011. Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems. In Proceedings of IEEE International Symposium on Information Theory. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  9. Cadambe, V. R., Huang, C., Li, J., and Mehrotra, S. 2011. Polynomial length MDS codes with optimal repair in distributed storage. In Proceedings of the Asilomar Conference on Signals, Systems and Computers.Google ScholarGoogle Scholar
  10. Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., McKelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., Haridas, J., Uddaraju, C., Khatri, H., Edwards, A., Bedekar, V., Mainali, S., Abbasi, R., Agarwal, A., Ul Haq, M. F., Ul Haq, M. I., Bhardwaj, D., Dayanand, S., Adusumilli, A., McNett, M., Sankaran, S., Manivannan, K., and Rigas, L. 2011. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chang, F., Ji, M., Leung, S.-T., Maccormick, J., Perl, S., and Zhang, L. 2002. Myriad: Cost-effective disaster tolerance. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, M., Huang, C., and Li, J. 2007. On the maximally recoverable property for multi-protection group codes. In Proceedings of the IEEE International Symposium on Information Theory.Google ScholarGoogle Scholar
  13. Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID -- High-performance, reliable secondary storage. ACM Comput. Surv. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, Y., Edler, J., Goldberg, A., Gottlieb, A., Sobti, S., and Yianilos, P. 1999. A prototype implementation of archival intermemory. In Proceedings of the ACM Conference on Digital Libraries. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dimakis, A. G., Godfrey, P. B., Wu, Y., Wainwright, M., and Ramchandran, K. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dingledine, R., Freedman, M. J., and Molnar, D. 2000. The free haven project: Distributed anonymous storage service. In Proceedings of the Workshop on Design Issues in Anonymity and Unobservability. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., and Welnicki, M. 2009. Hydrastor: A scalable secondary storage. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fikes, A. 2010. Storage architecture and challenges. Google Faculty Summit.Google ScholarGoogle Scholar
  20. Ford, D., Labelle, F., Popovici, F. I., Stokely, M., Truong, V.-A., Barroso, L., Grimes, C., and Quinlan, S. 2010. Availability in globally distributed storage systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gallager, R. G. 1963. Low-density parity-check codes. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  22. Gantenbein, D. 2012. A better way to store data. Microsoft Res. Featured Stories. http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx.Google ScholarGoogle Scholar
  23. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gopalan, P., Huang, C., Simitci, H., and Yekhanin, S. 2011. On the locality of codeword symbols. In Proceedings of the Allerton Conference on Communication, Control, and Computing.Google ScholarGoogle Scholar
  25. Greenan, K. M., Li, X., and Wylie, J. J. 2010. Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs. In Proceedings of the IEEE Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Greenan, K. M., Long, D. E., Miller, E. L., Schwarz, T. J. E., and Wylie, J. J. 2008. A spin-up saved is energy earned: Achieving power-efficient, erasure-coded storage. In Proceedings of the USENIX Workshop on Hot Topics in System Dependability. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Greenan, K. M., Miller, E., and Wylie, J. J. 2008. Reliability of flat XOR-based erasure codes on heterogeneous devices. In Proceedings of the IEEE International Conference on Dependable Systems and Networks.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Grolimund, D. 2007. P2P online storage. In Proceedings of the Web 2.0 Expo.Google ScholarGoogle Scholar
  29. Haeberlen, A., Mislove, A., and Druschel, P. 2005. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hafner, J. L. 2005. Weaver codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hafner, J. L. and Rao, K. 2006. Notes on reliability models for non-MDS erasure codes. IBM Tech. rep. RJ10391.Google ScholarGoogle Scholar
  32. Hafner, J. L., Deenadhayalan, V. W., Rao, K., and Tomlin, J. A. 2005. Matrix methods for lost data reconstruction in erasure codes. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hamilton, J. 2007. An architecture for modular data centers. In Proceedings of the Conference on Innovative Data Systems Research.Google ScholarGoogle Scholar
  34. Hosekote, D. K., He, D., and Hafner, J. L. 2007. REO: A generic RAID engine and optimizer. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Huang, C. and Xu, L. 2003. Fast software implementation of finite field operations. Tech. rep., Washington University, St. Louis, MO.Google ScholarGoogle Scholar
  36. Huang, C. and Xu, L. 2008. Star: An efficient coding scheme for correcting triple storage node failures. IEEE Trans. Computers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Huang, C., Chen, M., and Li, J. 2007. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In Proceedings of the IEEE International Symposium on Network Computing and Applications. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  38. Huang, C., Li, J., and Chen, M. 2007. On optimizing XOR-based codes for fault-tolerant storage applications. In Proceedings of the IEEE Information Theory Workshop. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  39. Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. 2012. Erasure Coding in Windows Azure Storage. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Khan, O., Burns, R., Plank, J., and Huang, C. 2011. In search of I/O-optimal recovery from disk failures. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Khan, O., Burns, R., Plank, J., Pierce, W., and Huang, C. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. Oceanstore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Luby, M. G., Mitzenmacher, M., Shokrollahi, A., and Spielman, D. A. 2001. Efficient erasure correcting codes. IEEE Trans. Inf. Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Luo, J., Xu, L., and Plank, J. S. 2009. An efficient XOR-scheduling algorithm for erasure codes encoding. In Proceedings of the IEEE International Conference on Dependable Systems and Networks.Google ScholarGoogle Scholar
  45. MacWilliams, F. J. and Sloane, N. J. A. 1977. The Theory of Error Correcting Codes. North-Holland, Amsterdam.Google ScholarGoogle Scholar
  46. Maymounkov, P. and Mazieres, D. 2003. Rateless codes and big downloads. In Proceedings of the International Workshop on Peer-To-Peer Systems.Google ScholarGoogle Scholar
  47. Papailiopoulos, D. S., Luo, J., Dimakis, A. G., Huang, C., and Li, J. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of the IEEE INFOCOM Mini-Conference. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  48. Plank, J. S. 1997. A tutorial on reed-solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Plank, J. S. 2008. The RAID-6 liberation codes. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Plank, J. S. and Xu, L. 2006. Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In Proceedings of the IEEE International Symposium on Network Computing and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Reed, I. S. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Soc. Industrial Appl. Math.Google ScholarGoogle Scholar
  52. Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., and Kubiatowicz, J. 2003. Pond: The oceanstore prototype. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Rowstron, A. and Druschel, P. 2001. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Saito, Y., Frolund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Schrijver, A. 2003. Combinatorial optimization, polyhedra and efficiency. Alg. Combinatorics.Google ScholarGoogle Scholar
  56. Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Shvachko, K., Kuang, H., Radia, S., and Chansler, R. 2010. The hadoop distributed file system. In Proceedings of the IEEE Symposium on Massive Storage Systems and Technologies. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Suh, C. and Ramchandran, K. 2011. Exact regeneration codes for distributed storage repair using interference alignment. IEEE Trans. Inf. Theory.Google ScholarGoogle Scholar
  59. Tanner, R. M. 1981. A recursive approach to low complexity codes. IEEE Trans. Inf. Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., and Bohra, A. 2010. Hydrafs: A high-throughput file system for the hydrastor content-addressable storage system. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Wang, Z., Tamo, I., and Bruck, J. 2011. On codes for optimal rebuilding access. Tech. rep. ETR111, Caltech.Google ScholarGoogle Scholar
  62. Weatherspoon, H. and Kubiatowics, J. 2001. Erasure coding vs. replication: A quantitative comparison. In Proceedings of the International Workshop on Peer-To-Peer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., and Zhou, B. 2008. Scalable performance of the Panasas parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Wildani, A., Schwarz, T. J. E., Miller, E. L., and Long, D. E. 2009. Protecting against rare event failures in archival systems. In Proceedings of the IEEE International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.Google ScholarGoogle Scholar

Index Terms

  1. Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Transactions on Storage
                    ACM Transactions on Storage  Volume 9, Issue 1
                    March 2013
                    84 pages
                    ISSN:1553-3077
                    EISSN:1553-3093
                    DOI:10.1145/2435204
                    Issue’s Table of Contents

                    Copyright © 2013 ACM

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 1 March 2013
                    • Accepted: 1 September 2012
                    • Revised: 1 August 2012
                    • Received: 1 October 2011
                    Published in tos Volume 9, Issue 1

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article
                    • Research
                    • Refereed

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!