skip to main content
research-article

ExaPlan: Efficient Queueing-Based Data Placement, Provisioning, and Load Balancing for Large Tiered Storage Systems

Published:22 May 2017Publication History
Skip Abstract Section

Abstract

Multi-tiered storage, where each tier consists of one type of storage device (e.g., SSD, HDD, or disk arrays), is a commonly used approach to achieve both high performance and cost efficiency in large-scale systems that need to store data with vastly different access characteristics. By aligning the access characteristics of the data, either fixed-sized extents or variable-sized files, to the characteristics of the storage devices, a higher performance can be achieved for any given cost. This article presents ExaPlan, a method to determine both the data-to-tier assignment and the number of devices in each tier that minimize the system’s mean response time for a given budget and workload. In contrast to other methods that constrain or minimize the system load, ExaPlan directly minimizes the system’s mean response time estimated by a queueing model. Minimizing the mean response time is typically intractable as the resulting optimization problem is both nonconvex and combinatorial in nature. ExaPlan circumvents this intractability by introducing a parameterized data placement approach that makes it a highly scalable method that can be easily applied to exascale systems. Through experiments that use parameters from real-world storage systems, such as CERN and LOFAR, it is demonstrated that ExaPlan provides solutions that yield lower mean response times than previous works. It supports standalone SSDs and HDDs as well as disk arrays as storage tiers, and although it uses a static workload representation, we provide empirical evidence that underlying dynamic workloads have invariant properties that can be deemed static for the purpose of provisioning a storage system. ExaPlan is also effective as a load-balancing tool used for placing data across devices within a tier, resulting in an up to 3.6-fold reduction of response time compared with a traditional load-balancing algorithm, such as the Longest Processing Time heuristic.

References

  1. G. A. Alvarez, E. Borowsky, S. Go, T. H. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Veitch, and J. Wilkes. 2001. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst. 19, 4 (2001), 483--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. 2002. Hippodrome: Running circles around storage administration. In Proc. 1st USENIX Conf. on File and Storage Technologies (FAST’02). Article 13, 175--188.Google ScholarGoogle Scholar
  3. E. Anderson, S. Spence, R. Swaminathan, M. Kallahalla, and Q. Wang. 2005. Quickly finding near-optimal storage designs. ACM Trans. Comput. Syst. 23, 4 (2005), 337--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Balcioğlu, D. L. Jagerman, and T. Altiok. 2008. Merging and splitting autocorrelated arrival processes and impact on queueing performance. Perform. Eval. 65, 9 (2008), 653--669. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Brandt, D. D. E. Long, and A. Amer. 2004. Predicting when not to predict. In Proc. 12th Annual IEEE/ACM Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’04). 419--426. Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Cancio, V. Bahyl, G. Lo Re, S. Murray, E. Cano, G. Lee, and V. Kotlyar. 2010. Tape archive challenges when approaching Exabyte-scale. (2010). Presentation at CHEP 2010, available online.Google ScholarGoogle Scholar
  7. N. Clayton and C. Fuente. 2013. Planning for Easy Tier with IBM System Storage Storwize V7000 and SAN Volume Controller. IBM Corporation.Google ScholarGoogle Scholar
  8. P. E. Dewdney. 2013. SKA1 System Baseline Design. SKA Office.Google ScholarGoogle Scholar
  9. B. Dufrasne and R. Wolf. 2016. IBM DS8000 EasyTier. An IBM Redpaper publication (7th ed.). IBM International Technical Support Organization (ITSO) http://www.redbooks.ibm.com/abstracts/ redp4667.html?Open.Google ScholarGoogle Scholar
  10. EMC Corporation. 2013. White Paper: EMC VNX FAST VP VNX5100, VNX5300, VNX5500, VNX5700, 8 VNX7500 A Detailed Review. EMC Corporation.Google ScholarGoogle Scholar
  11. D. Essary and A. Amer. 2008. Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans. Storage 4, 1, Article 2 (2008), 23 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. L. Graham. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17, 2 (1969), 416--429. Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Guerra, H. Pucha, J. Glider, W. Belluomini, and R. Rangaswami. 2011. Cost effective storage using extent based dynamic tiering. In Proc. 9th USENIX Conf. on File and Storage Technologies (FAST’11). 273--286.Google ScholarGoogle Scholar
  14. N. Hansen and A. Ostermeier. 1996. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proc. of IEEE Int’l Conf. on Evolutionary Computation. 312--317. Google ScholarGoogle ScholarCross RefCross Ref
  15. I. Iliadis, J. Jelitto, Y. Kim, S. Sarafijanovic, and V. Venkatesan. 2015. ExaPlan: Queueing-based data placement and provisioning for large tiered storage systems. In Proc. 23rd Annual IEEE Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’15). 218--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. K. Iyengar, M. S. Squillante, and L. Zhang. 1999. Analysis and characterization of large-scale web server access patterns and performance. World Wide Web 2, 1--2 (Jan. 1999), 85--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Kim, A. Gupta, B. Urgaonkar, P. Berman, and A. Sivasubramaniam. 2011. HybridStore: A cost-efficient, high-performance storage system combining SSDs and HDDs. In Proc. 19th Annual IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’11). 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. F. C. Kingman. 1961. The single server queue in heavy traffic. Math. Proc. Cambridge Philos. Soc. 57, 4 (1961), 902--904. Google ScholarGoogle ScholarCross RefCross Ref
  19. L. Lin, Y. Zhu, J. Yue, Z. Cai, and B. Segee. 2011. Hot random off-loading: A hybrid storage system with dynamic data migration. In Proc. 19th Annual IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’11). 318--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Narayanan, A. Donnelly, and A. Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. In Proc. 6th USENIX Conf. on File and Storage Technologies (FAST’08). Article 17, 253--267.Google ScholarGoogle Scholar
  21. H. Shi, R. V. Arumugam, C. H. Foh, and K. K. Khaing. 2012. Optimal disk storage allocation for multi-tier storage system. In Asia-Pacific Magnetic Recording Conference (APMRC), 2012 Digest. 1--7.Google ScholarGoogle Scholar
  22. J. D. Strunk, E. Thereska, C. Faloutsos, and G. R. Ganger. 2008. Using utility to provision storage systems. In Proc. 6th USENIX Conf. on File and Storage Technologies (FAST’08). Article 21, 313--328.Google ScholarGoogle Scholar
  23. M. P. van Haarlem and others. 2013. LOFAR: The LOw-Frequency ARray. Astron. Astrophys. 556, A2 (2013). Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Wildani. 2013. The Promise of Data Grouping in Large Scale Storage Systems. Ph.D. Dissertation. University of California, Santa Cruz, CA, 1--162.Google ScholarGoogle Scholar
  25. J. Wolf. 1989. The placement optimization program: A practical solution to the disk file assignment problem. SIGMETRICS Perform. Eval. Rev. 17, 1 (1989), 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ExaPlan: Efficient Queueing-Based Data Placement, Provisioning, and Load Balancing for Large Tiered Storage Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Storage
      ACM Transactions on Storage  Volume 13, Issue 2
      Special Issue on MSST 2016 and Regular Papers
      May 2017
      199 pages
      ISSN:1553-3077
      EISSN:1553-3093
      DOI:10.1145/3098275
      • Editor:
      • Sam H. Noh
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 May 2017
      • Revised: 1 March 2017
      • Accepted: 1 March 2017
      • Received: 1 October 2016
      Published in tos Volume 13, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!