skip to main content
research-article

Storage Workload Identification

Published:12 May 2016Publication History
Skip Abstract Section

Abstract

Storage workload identification is the task of characterizing a workload in a storage system (more specifically, network storage system—NAS or SAN) and matching it with the previously known workloads. We refer to storage workload identification as “workload identification” in the rest of this article. Workload identification is an important problem for cloud providers to solve because (1) providers can leverage this information to colocate similar workloads to make the system more predictable and (2) providers can identify workloads and subsequently give guidance to the subscribers as to associated best practices (with respect to configuration) for provisioning those workloads.

Historically, people have identified workloads by looking at their read/write ratios, random/sequential ratios, block size, and interarrival frequency. Researchers are well aware that workload characteristics change over time and that one cannot just take a point in time view of a workload, as that will incorrectly characterize workload behavior. Increasingly, manual detection of workload signature is becoming harder because (1) it is difficult for a human to detect a pattern and (2) representing a workload signature by a tuple consisting of average values for each of the signature components leads to a large error.

In this article, we present workload signature detection and a matching algorithm that is able to correctly identify workload signatures and match them with other similar workload signatures. We have tested our algorithm on nine different workloads generated using publicly available traces and on real customer workloads running in the field to show the robustness of our approach.

References

  1. C. L. Abad, N. Roberts, Y. Lu, and R. H. Campbell. 2012. A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’12). 100--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. N. Bairavasundaram, G. Soundararajan, V. Mathur, K. Voruganti, and K. Srinivasan. 2012. Responding rapidly to service level violations using virtual appliances. ACM SIGOPS Operating Systems Review 46, 32--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1983. Classification and Regression Trees. Chapman & Hall, New York, NY.Google ScholarGoogle Scholar
  4. C. J. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Chen, K. Srinivasan, G. Goodson, and R. Katz. 2011. Design implications for enterprise storage systems via multi-dimensional trace analysis. In Proceedings fo the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 43--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Delimitrou, S. Sankar, K. Vaid, and C. Kozyrakis. 2011. Accurate modeling and generation of storage I/O for datacenter workloads. In Proceedings of the 2nd Workshop on Exascale Evaluation and Research Techniques (EXERT’11).Google ScholarGoogle Scholar
  7. R. O. Duda, P. E. Hart, and D. G. Stork. 2001. Pattern Classification (2nd ed.). Wiley, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Elnaffar, P. Martin, and R. Horman. 2002. Automatically classifying database workloads. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gulati, C. Kumar, and I. Ahmad. 2009. Storage workload characterization and consolidation in virtualized environments. In Proceedings of the International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT’09).Google ScholarGoogle Scholar
  10. A. Gulati, G. Shanmuganathan, I. Ahmad, C. A. Waldspurger, and M. Uysal. 2011. Pesto: Online storage performance management in virtualized datacenters. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC’11). 19--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning (2nd ed.). Springer, New York. NY.Google ScholarGoogle Scholar
  12. W. Jiang, C. Hu, S. Pasupathy, A. Kanevsky, Z. Li, and Y. Zhou. 2009. Understanding customer problem troubleshooting from storage system logs. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09), Vol. 9. 43--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Kavalanekar, B. Worthington, Q. Zhang, and V. Sharda. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization. 119--128.Google ScholarGoogle Scholar
  14. J. Koshy. 2007. PMC Based Performance Measurement in FreeBSD. Retrieved April 2, 2016, from http://people.freebsd.org/∼jkoshy/projects/perf-measurement.Google ScholarGoogle Scholar
  15. N. D. Lawrence. 2012. A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. Journal of Machine Learning Research 13, 1609--1638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Liu, R. Gunasekaran, X. Ma, and S. S. Vazhkudai. 2014. Automatic identification of application I/O signatures from noisy server-side traces. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 213--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Narayanan, A. Donnelly, and A. Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). 253--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. S. Oh, K. S. Choi, J. R. Kwon, and S. H. Lee. 2008. Finding the near workload type between TPC-C and TPC-W environments. In Proceedings of the International Conference on Convergence and Hybrid Information Technology. 334--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. I. Pentakalos, D. A. Menasce, and Y. Yesha. 1996. Automated clustering-based workload characterization. In Proceedings of the 5th NASA Goddard Conference on Mass Storage Systems and Technologies.Google ScholarGoogle Scholar
  20. E. Perelman, M. Polito, J. Bouguet, J. Sampson, B. Calder, and C. Dulong. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Pipada, A. Kundu, K. Gopinath, C. Bhattacharyya, S. Susarla, and P. C. Nagesh. 2012. LoadIQ: Learning to identify workload phases from a live storage trace. In Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Riska and E. Riedel. 2006. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference. 97--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. R. Sandeep, M. Swapna, T. Niranjan, S. Susarla, and S. Nandi. 2008. CLUEBOX: A performance log analyzer for automated troubleshooting. In Proceedings of the 1st USENIX Workshop on the Analysis of System Logs. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Sikalinda. 2006. Analyzing Storage System Workloads. Ph.D. Dissertation. Department of Computer Science, University of Cape Town, South Africa.Google ScholarGoogle Scholar
  25. SNIA. 2011. SNIA IOTTA Repository: I/O Trace Data Files. Retrieved April 2, 2016, from http://iotta.snia.org/traces.Google ScholarGoogle Scholar
  26. SPC. 2002. Storage Performance Council: SPC Trace File Format Specification. Retrieved April 2, 2016, from http://skuld.cs.umass.edu/traces/storage/SPC-Traces.pdf.Google ScholarGoogle Scholar
  27. P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining. Addison Wesley Longman, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Veitch and K. Keeton. 2003. The Rubicon Workload Characterization Tool. Technical Report HPL-SSP-2003-13. HP Laboratories, SSP.Google ScholarGoogle Scholar
  29. M. Wang, K. Au, A. Ailamaki, A. Brockwell, C. Faloutsos, and G. R. Ganger. 2004. Storage device performance prediction with CART models. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems, (SIGMETRICS’04). 412--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. H. Witten, E. Frank, and M. A. Hall. 2011. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, CA.Google ScholarGoogle Scholar
  31. N. J. Yadwadkar, C. Bhattacharya, K. Gopinath, T. Niranjan, and S. Susarla. 2010. Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 183--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Yin, X. Ruan, A. Manzanares, X. Qin, and K. Li. 2014. MINT: A reliability modeling framework for energy-efficient parallel disk systems. IEEE Transactions on Dependable and Secure Computing 11, 345--360.Google ScholarGoogle ScholarCross RefCross Ref
  33. L. A. Zadeh. 1965. Fuzzy sets. Information and Control 8, 338--353.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Zheng. 2010. Workload-Aware Live Storage Migration for Clouds. Ph.D. Dissertation. Department of Computer Science, Rice University, Houston, TX.Google ScholarGoogle Scholar

Index Terms

  1. Storage Workload Identification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 12, Issue 3
        June 2016
        237 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/2932205
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 May 2016
        • Accepted: 1 August 2015
        • Revised: 1 June 2015
        • Received: 1 April 2013
        Published in tos Volume 12, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!