skip to main content
research-article

Agility and Performance in Elastic Distributed Storage

Published:31 October 2014Publication History
Skip Abstract Section

Abstract

Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can increase or decrease its number of servers. Due to the large amount of data they must migrate during elastic resizing, state of the art designs usually have to make painful trade-offs among performance, elasticity, and agility.

This article describes the state of the art in elastic storage and a new system, called SpringFS, that can quickly change its number of active servers, while retaining elasticity and performance goals. SpringFS uses a novel technique, termed bounded write offloading, that restricts the set of servers where writes to overloaded servers are redirected. This technique, combined with the read offloading and passive migration policies used in SpringFS, minimizes the work needed before deactivation or activation of servers. Analysis of real-world traces from Hadoop deployments at Facebook and various Cloudera customers and experiments with the SpringFS prototype confirm SpringFS’s agility, show that it reduces the amount of data migrated for elastic resizing by up to two orders of magnitude, and show that it cuts the percentage of active servers required by 67--82%, outdoing state-of-the-art designs by 6--120%.

References

  1. AMPLab. 2013. Algorithms, Machines, People Laboratory, Berkley. http://amplab.cs.berkeley.edu.Google ScholarGoogle Scholar
  2. Hrishikesh Amur, James Cipar, Varun Gupta, Gregory R. Ganger, Michael A. Kozuch, and Karsten Schwan. 2010. Robust and flexible power-proportional storage. In Proceedings of the ACM Symposium on Cloud Computing. 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter Bodik, Michael Armbrust, Kevin Canini, Armando Fox, Michael Jordan, and David Patterson. 2008. A Case for Adaptive Datacenters to Conserve Energy and Improve reliability. University of California at Berkeley, Tech. Rep. UCB/EECS-2008-127.Google ScholarGoogle Scholar
  4. Dhruba Borthakur. 2007. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation.Google ScholarGoogle Scholar
  5. Randal E. Bryant. 2007. Data-intensive supercomputing: The case for DISC. Tech. rep., Carnegie Mellon University.Google ScholarGoogle Scholar
  6. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. BigTable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2, 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yanpei Chen, Sara Alspaugh, and Randy Katz. 2012. Interactive analytical processing in big data systems: A cross industry study of MapReduce workloads. Proc. VLDB Endow. 5, 12, 1802--1813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yanpei Chen, Archana Ganapathi, Rean Griffith, and Randy Katz. 2011. The case for evaluating MapReduce performance using workload suites. In Proceedings of the IEEE 9th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sanjay Ghemawat, Howard Gobioff, and Shun tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating System Principles (SOSP). 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel Gmach, Jerry Rolia, Ludmila Cherkasova, and Alfons Kemper. 2007. Workload analysis and demand prediction of enterprise data center applications. In Proceedings of the IEEE 10th International Symposium or Workload Characterization (IISWC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hadoop. 2012. The Apache Hadoop project. http://hadoop.apache.org.Google ScholarGoogle Scholar
  13. Larry Hardesty. 2012. MIT, Intel unveil new initiatives addressing ’Big Data’. http://web.mit.edu/newsoffice/2012/big-data-csail-intel-center-0531.html.Google ScholarGoogle Scholar
  14. Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). ACM, New York, NY, 261--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ISTC-CC. 2013. Intel science and technology center - cloud computing. www.istc-cc.cmu.edu.Google ScholarGoogle Scholar
  16. Jacob Leverich and Christos Kozyrakis. 2009. On the energy (in)efficiency of Hadoop clusters. In Proceedings of the Workshop on Power-Aware Computing and System HotPower.Google ScholarGoogle Scholar
  17. Minghong Lin, Adam Wierman, Lachlan L. H. Andrew, and Eno Thereska. 2011. Dynamic right-sizing for power-proportional data centers. In Proceedings of the INFOCOM.Google ScholarGoogle ScholarCross RefCross Ref
  18. Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008a. Write off-loading: Practical power management for enterprise storage. In Proceedings of the USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dushyanth Narayanan, Austin Donnelly, Eno Thereska, Sameh Elnikety, and Antony Rowstron. 2008b. Everest: Scaling down peak loads through I/O off-loading. In Proceedings of the 8th USENIX Symposium on Operating Systems and Implementation (OSD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yasushi Saito, Svend Frølund, Alistair Veitch, Arif Merchant, and Susan Spence. 2004. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the 11th International Conference on Architechtural Support for Programming Languages and Operating System. 48--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Eno Thereska, Austin Donnelly, and Dushyanth Narayanan. 2011. Sierra: Practical power-proportionality for data center storage. In Proceedings of the 6th Conference on Computer Systems (EuroSys). 169--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nedeljko Vasić, Martin Barisits, Vincent Salzgeber, and Dejan Kostic. 2009. Making cluster applications energy-aware. In Proceedings of the Workshop on Automated Control for Datacenters and Clouds. 37--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Charles Weddle, Mathew Oldham, Jin Qian, An-I Andy Wang, Peter L. Reiher, and Geoffrey H. Kuenning. 2007. PARAID: A gear-shifting power-aware RAID. ACM Trans. Storage 3, 3, Article 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. R. Zayas. 1991. AFS-3 programmer’s reference: Architectural overview. Tech. Rep. Transarc Corporation.Google ScholarGoogle Scholar

Index Terms

  1. Agility and Performance in Elastic Distributed Storage

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!