skip to main content
research-article

Exploiting Redundancies and Deferred Writes to Conserve Energy in Erasure-Coded Storage Clusters

Published:01 July 2013Publication History
Skip Abstract Section

Abstract

We present a power-efficient scheme for erasure-coded storage clusters---ECS2---which aims to offer high energy efficiency with marginal reliability degradation. ECS2 utilizes data redundancies and deferred writes to conserve energy. In ECS2 parity blocks are buffered exclusively in active data nodes whereas parity nodes are placed into low-power mode. (k + r, k) RS-coded ECS2 can achieve ⌈(r + 1)/2⌉-fault tolerance for k active data nodes and r-fault tolerance for all k + r nodes. ECS2 employs the following three optimizing approaches to improve the energy efficiency of storage clusters. (1) An adaptive threshold policy takes system configurations and I/O workloads into account to maximize standby time periods; (2) a selective activation policy minimizes the number of power-transitions in storage nodes; and (3) a region-based buffer policy speeds up the synchronization process by migrating parity blocks in a batch method. After implementing an ECS2-based prototype in a Linux cluster, we evaluated its energy efficiency and performance using four different types of I/O workloads. The experimental results indicate that compared to energy-oblivious erasure-coded storage, ECS2 can save the energy used by storage clusters up to 29.8% and 28.0% in read-intensive and write-dominated workloads when k = 6 and r = 3, respectively. The results also show that ECS2 accomplishes high power efficiency in both normal and failed cases without noticeably affecting the I/O performance of storage clusters.

References

  1. Aguilera, M., Janakiraman, R., and Xu, L. 2005. Using erasure codes efficiently for storage in a distributed system. In Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN). 336--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amur, H., Cipar, J., Gupta, V., Ganger, G., Kozuch, M., and Schwan, K. 2010. Robust and flexible power-proportional storage. In Proceedings of the 1st ACM Symposium on Cloud Computing. 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Application, O. 2007. I/O and search engine I/O. umass trace repository. http://traces.cs.umass.edu/.Google ScholarGoogle Scholar
  4. Bairavasundaram, L., Goodson, G., Pasupathy, S., and Schindler, J. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. Evenodd: An efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44, 2, 192--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Borthakur, D. 2008. Hdfs architecture guide. http://hadoop.apache.org/hdfs.Google ScholarGoogle Scholar
  7. Borthakur, D. 2010. Hdfs and erasure codes (hdfs-raid).Google ScholarGoogle Scholar
  8. Chen, F., Jiang, S., and Zhang, X. 2006. Smartsaver: Turning flash drive into a disk energy saver for mobile computers. In Proceedings of the IEEE International Symposium on Low Power Electronics and Design (ISLPED). 412--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Colarelli, D. and Grunwald, D. 2002. Massive arrays of idle disks for storage archives. In Proceedings of the ACM/IEEE Conference on Supercomputing. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fan, B., Tantisiriroj, W., Xiao, L., and Gibson, G. 2009. Diskreduce: Raid for data-intensive scalable computing. In Proceedings of the 4th Annual Workshop on Petascale Data Storage. 6--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ganesh, L., Weatherspoon, H., Balakrishnan, M., and Birman, K. 2007. Optimizing power consumption in large scale storage systems. In Proceedings of the 11th USENIX Workshop On Hot Topics In Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ghemawat, S., Gobioff, H., and Leung, S. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Greenan, K., Long, D., Miller, E., Schwarz, S., and Wylie, J. 2008. A spin-up saved is energy earned: achieving power-efficient, erasure-coded storage. In Proceedings of the 4th USENIX Conference on Hot Topics in System Principles. 4--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hafner, J. 2005. Weaver codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the 4th USENIX Conference on File and Storage Technologies. 211--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hafner, J. and Rao, K. 2006. Notes on reliability models for non-MDS erasure codes. IBM Res. rep. RJ10391.Google ScholarGoogle Scholar
  17. Hafner, J., Pandey, P., and Thakur, T. 2010. Read-modify-write protocol for maintaining parity coherency in a write-back distributed redundancy data storage system. US Patent App. 12/710,123.Google ScholarGoogle Scholar
  18. Holland, M., Gibson, G., and Siewiorek, D. 1994. Architectures and algorithms for on-line failure recovery in redundant disk arrays. Distrib. Par. Datab. 2, 3, 295--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiang, W., Hu, C., Zhou, Y., and Kanevsky, A. 2008. Are disks the dominant contributor for storage failures?: a comprehensive study of storage subsystem failure characteristics. ACM Trans. Storage 4, 3, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kavalanekar, S., Worthington, B., Zhang, Q., and Sharda, V. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 119--128.Google ScholarGoogle Scholar
  21. Lang, W. and Patel, J. 2010. Energy management for mapreduce clusters. Proc. VLDB Endow. 3, 1--2, 129--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lang, W., Patel, J., and Naughton, J. 2010. On energy management, load balancing and replication. ACM SIGMOD Rec. 38, 4, 35--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leverich, J. and Kozyrakis, C. 2010. On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44, 1, 61--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Li, D. and Wang, J. 2004. Eeraid: Energy efficient redundant and inexpensive disk array. In Proceedings of the 11th Workshop on ACM SIGOPS European Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lumb, C., Schindler, J., Ganger, G., Nagle, D., and Riedel, E. 2000. Towards higher disk head utilization: extracting free bandwidth from busy disk drives. In Proceedings of the 4th USENIX Conference on Symposium on Operating System Design & Implementation-Volume 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Meisner, D., Gold, B., and Wenisch, T. 2009. Powernap: Eliminating server idle power. ACM SIGPLAN Notices 44, 3, 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Moore, R., D’Aoust, J., McDonald, R., and Minor, D. 2007. Disk and tape storage cost models. http://users.sdsc.edu/~mcdonald/content/papers/dt cost.pdf.Google ScholarGoogle Scholar
  28. Narayanan, D., Donnelly, A., and Rowstron, A. 2008. Write off-loading: Practical power management for enterprise storage. ACM Trans. Storage 4, 3, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ongaro, D., Rumble, S., Stutsman, R., Ousterhout, J., and Rosenblum, M. 2011. Fast crash recovery in ramcloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Phanishayee, A., Krevat, E., Vasudevan, V., Andersen, D., Ganger, G., Gibson, G., and Seshan, S. 2008. Measurement and analysis of tcp throughput collapse in cluster-based storage systems. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pinheiro, E. and Bianchini, R. 2004. Energy conservation techniques for disk array-based servers. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS). 68--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pinheiro, E., Bianchini, R., and Dubnicki, C. 2006. Exploiting redundancy to conserve energy in storage systems. ACM SIGMETRICS Perf. Eval. Rev. 34, 1, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Plank, J. 2005. Erasure codes for storage applications. In Proceedings of 4th USENIX Conference on File and Storage Technologies (FAST)--Tutorial slides.Google ScholarGoogle Scholar
  34. Plank, J. et al. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in raid-like systems. Softw. Pract. Exper. 27, 9, 995--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Plank, J., Luo, J., Schuman, C., Xu, L., and Wilcox-O’Hearn, Z. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proccedings of the 7th Conference on File and Storage Technologies (FAST). 253--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rao, K., Hafner, J., and Golding, R. 2011. Reliability for networked storage nodes. IEEE Trans. Dependable Sec. Comput. 8, 3, 404--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Reed, I. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Soc. Indus. Appl. Math. 8, 2, 300--304.Google ScholarGoogle ScholarCross RefCross Ref
  38. Samsung. 2008. Data sheet of Samsung ddr2 sdram. http://www.samsung.com/.Google ScholarGoogle Scholar
  39. Seagate. 2011. Data sheet of seagate disk drive. http://www.seagate.com/docs/pdf/datasheet/.Google ScholarGoogle Scholar
  40. Storer, M., Greenan, K., Miller, E., and Voruganti, K. 2008. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tsirogiannis, D., Harizopoulos, S., and Shah, M. 2010. Analyzing the energy efficiency of a database server. In Proceedings of the International Conference on Management of Data. 231--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wang, J., Zhu, H., and Li, D. 2007. eRAID: Conserving energy in conventional disk-based raid system. IEEE Trans. Comput., 359--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Weatherspoon, H. and Kubiatowicz, J. 2002. Erasure coding vs. replication: A quantitative comparison. In Peer-to-Peer Systems, 328--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Weddle, C., Oldham, M., Qian, J., Wang, A., Reiher, P., and Kuenning, G. 2007. Paraid: A gear-shifting power-aware raid. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wilcox-O’Hearn, Z. 2011. Zfec 1.4.2. open source code distribution. http://pypi.python.org/pypi/zfec.Google ScholarGoogle Scholar
  46. Xin, Q., Miller, E., Schwarz, T., Long, D., Brandt, S., and Litwin, W. 2003. Reliability mechanisms for very large storage systems. In Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST). 146--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yang, L., Li, X., and Zhang, X. 2011. Research on energy consumption of general storage network system. Comput. Eng. (China) 37, 18, 53--58.Google ScholarGoogle Scholar
  48. Yao, X. and Wang, J. 2006. Rimac: A novel redundancy-based hierarchical cache architecture for energy efficient, high performance storage systems. ACM SIGOPS Oper. Syst. Rev. 40, 4, 249--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhang, Z., Deshpande, A., Ma, X., Thereska, E., and Narayanan, D. 2010. Does erasure coding have a role to play in my data center? Tech. rep. MSR-TR-2010-52, Microsoft.Google ScholarGoogle Scholar
  50. Zhu, Q., David, F., Devaraj, C., Li, Z., Zhou, Y., and Cao, P. 2004. Reducing energy consumption of disk storage using power-aware cache management. IEE Proc. Softw., 118--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhu, Q., Chen, Z., Tan, L., Zhou, Y., Keeton, K., and Wilkes, J. 2005. Hibernator: Helping disk arrays sleep through the winter. In Proceedings of the 20th ACM Symposium on Operating Systems Principles. 177--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zyhd. 2010. Zh-101 portable electric power fault recorder and analyzer. http://www.zyhd.com.cn/cN/Bs Product.asp.Google ScholarGoogle Scholar

Index Terms

  1. Exploiting Redundancies and Deferred Writes to Conserve Energy in Erasure-Coded Storage Clusters

    Recommendations

    Reviews

    David Gary Hill

    Storage clusters use a lot of energy, so devising a power efficiency scheme to conserve energy is worthwhile. The authors of this paper caution that while existing energy conservation techniques for storage clusters reduce energy consumption, the savings come at the expense of reduced performance or reliability of the systems. The paper describes an approach that overcomes most of the negative aspects while improving the energy efficiency of the storage cluster by what the authors claim is almost 30 percent over similar but energy-oblivious storage implementations. The authors describe the application of their approach to erasure-coded storage clusters. Erasure coding is a method of data protection that improves the ability to recover data from storage node failures. They apply a mathematical function to take data fragments and expand them in an efficient manner that requires less redundancy for a high level of recoverability (relative to more commonly used techniques). Those data pieces are then spread across storage nodes. The proposed strategy requires the management of storage nodes so that as many components of the storage nodes as possible are kept in low-power mode. One method of doing this is to defer updates (in other words, use buffered writes) to nonvolatile random access memory (NVRAM) or flash devices on a temporary basis, which avoids having to spin up disks that are in the low-power mode. The paper goes into a lot of technical detail on how the proposed strategy works. Storage system designers should find it useful. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!