Abstract
Storage workload identification is the task of characterizing a workload in a storage system (more specifically, network storage system—NAS or SAN) and matching it with the previously known workloads. We refer to storage workload identification as “workload identification” in the rest of this article. Workload identification is an important problem for cloud providers to solve because (1) providers can leverage this information to colocate similar workloads to make the system more predictable and (2) providers can identify workloads and subsequently give guidance to the subscribers as to associated best practices (with respect to configuration) for provisioning those workloads.
Historically, people have identified workloads by looking at their read/write ratios, random/sequential ratios, block size, and interarrival frequency. Researchers are well aware that workload characteristics change over time and that one cannot just take a point in time view of a workload, as that will incorrectly characterize workload behavior. Increasingly, manual detection of workload signature is becoming harder because (1) it is difficult for a human to detect a pattern and (2) representing a workload signature by a tuple consisting of average values for each of the signature components leads to a large error.
In this article, we present workload signature detection and a matching algorithm that is able to correctly identify workload signatures and match them with other similar workload signatures. We have tested our algorithm on nine different workloads generated using publicly available traces and on real customer workloads running in the field to show the robustness of our approach.
- C. L. Abad, N. Roberts, Y. Lu, and R. H. Campbell. 2012. A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’12). 100--109. Google Scholar
Digital Library
- L. N. Bairavasundaram, G. Soundararajan, V. Mathur, K. Voruganti, and K. Srinivasan. 2012. Responding rapidly to service level violations using virtual appliances. ACM SIGOPS Operating Systems Review 46, 32--40. Google Scholar
Digital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1983. Classification and Regression Trees. Chapman & Hall, New York, NY.Google Scholar
- C. J. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google Scholar
Digital Library
- Y. Chen, K. Srinivasan, G. Goodson, and R. Katz. 2011. Design implications for enterprise storage systems via multi-dimensional trace analysis. In Proceedings fo the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 43--56. Google Scholar
Digital Library
- C. Delimitrou, S. Sankar, K. Vaid, and C. Kozyrakis. 2011. Accurate modeling and generation of storage I/O for datacenter workloads. In Proceedings of the 2nd Workshop on Exascale Evaluation and Research Techniques (EXERT’11).Google Scholar
- R. O. Duda, P. E. Hart, and D. G. Stork. 2001. Pattern Classification (2nd ed.). Wiley, New York, NY. Google Scholar
Digital Library
- S. Elnaffar, P. Martin, and R. Horman. 2002. Automatically classifying database workloads. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’02). Google Scholar
Digital Library
- A. Gulati, C. Kumar, and I. Ahmad. 2009. Storage workload characterization and consolidation in virtualized environments. In Proceedings of the International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT’09).Google Scholar
- A. Gulati, G. Shanmuganathan, I. Ahmad, C. A. Waldspurger, and M. Uysal. 2011. Pesto: Online storage performance management in virtualized datacenters. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC’11). 19--32. Google Scholar
Digital Library
- T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning (2nd ed.). Springer, New York. NY.Google Scholar
- W. Jiang, C. Hu, S. Pasupathy, A. Kanevsky, Z. Li, and Y. Zhou. 2009. Understanding customer problem troubleshooting from storage system logs. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09), Vol. 9. 43--56. Google Scholar
Digital Library
- S. Kavalanekar, B. Worthington, Q. Zhang, and V. Sharda. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization. 119--128.Google Scholar
- J. Koshy. 2007. PMC Based Performance Measurement in FreeBSD. Retrieved April 2, 2016, from http://people.freebsd.org/∼jkoshy/projects/perf-measurement.Google Scholar
- N. D. Lawrence. 2012. A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. Journal of Machine Learning Research 13, 1609--1638. Google Scholar
Digital Library
- Y. Liu, R. Gunasekaran, X. Ma, and S. S. Vazhkudai. 2014. Automatic identification of application I/O signatures from noisy server-side traces. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 213--228. Google Scholar
Digital Library
- D. Narayanan, A. Donnelly, and A. Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). 253--267. Google Scholar
Digital Library
- J. S. Oh, K. S. Choi, J. R. Kwon, and S. H. Lee. 2008. Finding the near workload type between TPC-C and TPC-W environments. In Proceedings of the International Conference on Convergence and Hybrid Information Technology. 334--337. Google Scholar
Digital Library
- O. I. Pentakalos, D. A. Menasce, and Y. Yesha. 1996. Automated clustering-based workload characterization. In Proceedings of the 5th NASA Goddard Conference on Mass Storage Systems and Technologies.Google Scholar
- E. Perelman, M. Polito, J. Bouguet, J. Sampson, B. Calder, and C. Dulong. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). Google Scholar
Digital Library
- P. Pipada, A. Kundu, K. Gopinath, C. Bhattacharyya, S. Susarla, and P. C. Nagesh. 2012. LoadIQ: Learning to identify workload phases from a live storage trace. In Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’12). Google Scholar
Digital Library
- A. Riska and E. Riedel. 2006. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference. 97--102. Google Scholar
Digital Library
- S. R. Sandeep, M. Swapna, T. Niranjan, S. Susarla, and S. Nandi. 2008. CLUEBOX: A performance log analyzer for automated troubleshooting. In Proceedings of the 1st USENIX Workshop on the Analysis of System Logs. Google Scholar
Digital Library
- P. Sikalinda. 2006. Analyzing Storage System Workloads. Ph.D. Dissertation. Department of Computer Science, University of Cape Town, South Africa.Google Scholar
- SNIA. 2011. SNIA IOTTA Repository: I/O Trace Data Files. Retrieved April 2, 2016, from http://iotta.snia.org/traces.Google Scholar
- SPC. 2002. Storage Performance Council: SPC Trace File Format Specification. Retrieved April 2, 2016, from http://skuld.cs.umass.edu/traces/storage/SPC-Traces.pdf.Google Scholar
- P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining. Addison Wesley Longman, Boston, MA. Google Scholar
Digital Library
- A. Veitch and K. Keeton. 2003. The Rubicon Workload Characterization Tool. Technical Report HPL-SSP-2003-13. HP Laboratories, SSP.Google Scholar
- M. Wang, K. Au, A. Ailamaki, A. Brockwell, C. Faloutsos, and G. R. Ganger. 2004. Storage device performance prediction with CART models. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems, (SIGMETRICS’04). 412--413. Google Scholar
Digital Library
- I. H. Witten, E. Frank, and M. A. Hall. 2011. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, CA.Google Scholar
- N. J. Yadwadkar, C. Bhattacharya, K. Gopinath, T. Niranjan, and S. Susarla. 2010. Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 183--196. Google Scholar
Digital Library
- S. Yin, X. Ruan, A. Manzanares, X. Qin, and K. Li. 2014. MINT: A reliability modeling framework for energy-efficient parallel disk systems. IEEE Transactions on Dependable and Secure Computing 11, 345--360.Google Scholar
Cross Ref
- L. A. Zadeh. 1965. Fuzzy sets. Information and Control 8, 338--353.Google Scholar
Cross Ref
- J. Zheng. 2010. Workload-Aware Live Storage Migration for Clouds. Ph.D. Dissertation. Department of Computer Science, Rice University, Houston, TX.Google Scholar
Index Terms
Storage Workload Identification
Recommendations
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Workload balancing and adaptive resource management for the swift storage system on cloud
The demand for big data storage and processing has become a challenge in today's industry. To meet the challenge, there is an increasing number of enterprises adopting distributed storage systems. Frequently, in these systems, storage nodes intensively ...
Decomposing Workload Bursts for Efficient Storage Resource Management
The growing popularity of hosted storage services and shared storage infrastructure in data centers is driving the recent interest in resource management and QoS in storage systems. The bursty nature of storage workloads raises significant performance ...






Comments