Abstract
Multi-tiered storage, where each tier consists of one type of storage device (e.g., SSD, HDD, or disk arrays), is a commonly used approach to achieve both high performance and cost efficiency in large-scale systems that need to store data with vastly different access characteristics. By aligning the access characteristics of the data, either fixed-sized extents or variable-sized files, to the characteristics of the storage devices, a higher performance can be achieved for any given cost. This article presents ExaPlan, a method to determine both the data-to-tier assignment and the number of devices in each tier that minimize the system’s mean response time for a given budget and workload. In contrast to other methods that constrain or minimize the system load, ExaPlan directly minimizes the system’s mean response time estimated by a queueing model. Minimizing the mean response time is typically intractable as the resulting optimization problem is both nonconvex and combinatorial in nature. ExaPlan circumvents this intractability by introducing a parameterized data placement approach that makes it a highly scalable method that can be easily applied to exascale systems. Through experiments that use parameters from real-world storage systems, such as CERN and LOFAR, it is demonstrated that ExaPlan provides solutions that yield lower mean response times than previous works. It supports standalone SSDs and HDDs as well as disk arrays as storage tiers, and although it uses a static workload representation, we provide empirical evidence that underlying dynamic workloads have invariant properties that can be deemed static for the purpose of provisioning a storage system. ExaPlan is also effective as a load-balancing tool used for placing data across devices within a tier, resulting in an up to 3.6-fold reduction of response time compared with a traditional load-balancing algorithm, such as the Longest Processing Time heuristic.
- G. A. Alvarez, E. Borowsky, S. Go, T. H. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Veitch, and J. Wilkes. 2001. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst. 19, 4 (2001), 483--518. Google Scholar
Digital Library
- E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. 2002. Hippodrome: Running circles around storage administration. In Proc. 1st USENIX Conf. on File and Storage Technologies (FAST’02). Article 13, 175--188.Google Scholar
- E. Anderson, S. Spence, R. Swaminathan, M. Kallahalla, and Q. Wang. 2005. Quickly finding near-optimal storage designs. ACM Trans. Comput. Syst. 23, 4 (2005), 337--374. Google Scholar
Digital Library
- B. Balcioğlu, D. L. Jagerman, and T. Altiok. 2008. Merging and splitting autocorrelated arrival processes and impact on queueing performance. Perform. Eval. 65, 9 (2008), 653--669. Google Scholar
Digital Library
- K. Brandt, D. D. E. Long, and A. Amer. 2004. Predicting when not to predict. In Proc. 12th Annual IEEE/ACM Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’04). 419--426. Google Scholar
Cross Ref
- G. Cancio, V. Bahyl, G. Lo Re, S. Murray, E. Cano, G. Lee, and V. Kotlyar. 2010. Tape archive challenges when approaching Exabyte-scale. (2010). Presentation at CHEP 2010, available online.Google Scholar
- N. Clayton and C. Fuente. 2013. Planning for Easy Tier with IBM System Storage Storwize V7000 and SAN Volume Controller. IBM Corporation.Google Scholar
- P. E. Dewdney. 2013. SKA1 System Baseline Design. SKA Office.Google Scholar
- B. Dufrasne and R. Wolf. 2016. IBM DS8000 EasyTier. An IBM Redpaper publication (7th ed.). IBM International Technical Support Organization (ITSO) http://www.redbooks.ibm.com/abstracts/ redp4667.html?Open.Google Scholar
- EMC Corporation. 2013. White Paper: EMC VNX FAST VP VNX5100, VNX5300, VNX5500, VNX5700, 8 VNX7500 A Detailed Review. EMC Corporation.Google Scholar
- D. Essary and A. Amer. 2008. Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans. Storage 4, 1, Article 2 (2008), 23 pages.Google Scholar
Digital Library
- R. L. Graham. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17, 2 (1969), 416--429. Google Scholar
Cross Ref
- J. Guerra, H. Pucha, J. Glider, W. Belluomini, and R. Rangaswami. 2011. Cost effective storage using extent based dynamic tiering. In Proc. 9th USENIX Conf. on File and Storage Technologies (FAST’11). 273--286.Google Scholar
- N. Hansen and A. Ostermeier. 1996. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proc. of IEEE Int’l Conf. on Evolutionary Computation. 312--317. Google Scholar
Cross Ref
- I. Iliadis, J. Jelitto, Y. Kim, S. Sarafijanovic, and V. Venkatesan. 2015. ExaPlan: Queueing-based data placement and provisioning for large tiered storage systems. In Proc. 23rd Annual IEEE Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’15). 218--227. Google Scholar
Digital Library
- A. K. Iyengar, M. S. Squillante, and L. Zhang. 1999. Analysis and characterization of large-scale web server access patterns and performance. World Wide Web 2, 1--2 (Jan. 1999), 85--100.Google Scholar
Digital Library
- Y. Kim, A. Gupta, B. Urgaonkar, P. Berman, and A. Sivasubramaniam. 2011. HybridStore: A cost-efficient, high-performance storage system combining SSDs and HDDs. In Proc. 19th Annual IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’11). 227--236. Google Scholar
Digital Library
- J. F. C. Kingman. 1961. The single server queue in heavy traffic. Math. Proc. Cambridge Philos. Soc. 57, 4 (1961), 902--904. Google Scholar
Cross Ref
- L. Lin, Y. Zhu, J. Yue, Z. Cai, and B. Segee. 2011. Hot random off-loading: A hybrid storage system with dynamic data migration. In Proc. 19th Annual IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’11). 318--325. Google Scholar
Digital Library
- D. Narayanan, A. Donnelly, and A. Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. In Proc. 6th USENIX Conf. on File and Storage Technologies (FAST’08). Article 17, 253--267.Google Scholar
- H. Shi, R. V. Arumugam, C. H. Foh, and K. K. Khaing. 2012. Optimal disk storage allocation for multi-tier storage system. In Asia-Pacific Magnetic Recording Conference (APMRC), 2012 Digest. 1--7.Google Scholar
- J. D. Strunk, E. Thereska, C. Faloutsos, and G. R. Ganger. 2008. Using utility to provision storage systems. In Proc. 6th USENIX Conf. on File and Storage Technologies (FAST’08). Article 21, 313--328.Google Scholar
- M. P. van Haarlem and others. 2013. LOFAR: The LOw-Frequency ARray. Astron. Astrophys. 556, A2 (2013). Google Scholar
Cross Ref
- A. Wildani. 2013. The Promise of Data Grouping in Large Scale Storage Systems. Ph.D. Dissertation. University of California, Santa Cruz, CA, 1--162.Google Scholar
- J. Wolf. 1989. The placement optimization program: A practical solution to the disk file assignment problem. SIGMETRICS Perform. Eval. Rev. 17, 1 (1989), 1--10. Google Scholar
Digital Library
Index Terms
ExaPlan: Efficient Queueing-Based Data Placement, Provisioning, and Load Balancing for Large Tiered Storage Systems
Recommendations
ExaPlan: Queueing-Based Data Placement and Provisioning for Large Tiered Storage Systems
MASCOTS '15: Proceedings of the 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication SystemsMulti-tiered storage, where each tier comprises one type of storage device, e.g., SSD, HDD, is a commonly used approach to achieve both high performance and cost efficiency in large-scale systems that need to store data with vastly different access ...
TDDFS: A Tier-Aware Data Deduplication-Based File System
Special Issue on ACM International Systems and Storage Conference (SYSTOR) 2018With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and cost-effectiveness. In a basic two-tier storage ...
Evaluation of Exclusive Data Allocation Between SSD Tier and SSD Cache in Storage Systems
ICEIS 2014: Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1We proposed and evaluated the storage I/O response time with the exclusive allocation method between SSD for tiered volume and SSD for cache in the storage system utilizing SSD and HDD. In the proposed method, the SSD cache function with exclusive ...






Comments