Abstract
Database storage management at data centers is a manual, time-consuming, and error-prone task. Such management involves regular movement of database objects across storage nodes in an attempt to balance the I/O bandwidth utilization across disk drives. Achieving such balance is critical for avoiding I/O bottlenecks and thereby maximizing the utilization of the storage system. However, manual management of the aforesaid task, apart from increasing administrative costs, encumbers the greater risks of untimely and erroneous operations. We address the preceding concerns with STORM, an automated approach that combines low-overhead information gathering of database access and storage usage patterns with efficient analysis to generate accurate and timely hints for the administrator regarding data movement operations. STORM's primary objective is minimizing the volume of data movement required (to minimize potential down-time or reduction in performance) during the reconfiguration operation, with the secondary constraints of space and balanced I/O-bandwidth-utilization across the storage devices. We analyze and evaluate STORM theoretically, using a simulation framework, as well as experimentally. We show that the dynamic data layout reconfiguration problem is NP-hard and we present a heuristic that provides an approximate solution in O(Nlog(N/M) + (N/M)2) time, where M is the number of storage devices and N is the total number of database objects residing in the storage devices. A simulation study shows that the heuristic converges to an acceptable solution that is successful in balancing storage utilization with an accuracy that lies within 7% of the ideal solution. Finally, an experimental study demonstrates that the STORM approach can improve the overall performance of the TPC-C benchmark by as much as 22%, by reconfiguring an initial random, but evenly distributed, placement of database objects.
- Aboulnaga, A. and Chaudhuri, S. 1999. Self-Tuning histograms: Building histograms without looking at data. In Proceedings of the International Conference on Management of Database ACM SIGMOD. Philadelphia, PA, 181--912. Google Scholar
Digital Library
- Allen, N. 2001. Don't waste your storage dollars: What you need to know. Res. Note, Gartner Group.Google Scholar
- An, N., Jin, J., and Sivasubramaniam, A. 2003. Algorithms for index-assisted selectivity estimation. IEEE Trans. Knowl. Data Eng. 15, 2, 305--323. Google Scholar
Digital Library
- Aoki, P. 1999. Toward an accurate analysis of range queries on spatial data. Proceedings of the 15th International Conference on Data Engineering, 258.Google Scholar
- BMC Software. 2005. Capacity management and provisioning. www.bmc.com.Google Scholar
- Cappanera, P. and Trubian, M. 2005. A local search based heuristic for the demand constrained multidimensional knapsack problem. INFORMS J. Comput. 17, 82--98. Google Scholar
Digital Library
- Chu, P. and Beasley, J. 1998. A genetic algorithm for the multidimensional knapsack problem. J. Heuristics 4, 63--86. Google Scholar
Digital Library
- Computer Associates. 2005. Storage management. www.ca.com/products.Google Scholar
- Feng, Y. and Zhang, Y.-Y. 2005. Virtual disk reconfiguration with performance guarantees in shared storage environment. In Proceedings of the 3rd International Conference on Information Technology and Applications, 69--74. Google Scholar
Digital Library
- Furtado, P. 2004. Experimental evidence on partitioning in parallel data warehouses. In Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP (DOLAP). ACM Press, New York, NY, USA, 23--30. Google Scholar
Digital Library
- Ganger, G. R., Worthington, B. L., Hou, R. Y., and Patt, Y. N. 1993. Disk subsystem load balancing: Disk striping vs. conventional data placement. In Proceedings of the International Conference on System Sciences.Google Scholar
- Hua, K. A. and Lee, C. 1990. An adaptive data placement scheme for parallel database computer systems. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, San Francisco, CA, 493--506. Google Scholar
Digital Library
- IBM 2006. Storage area network (SAN). http://www-03.ibm.com/servers/storage/san/.Google Scholar
- ILOG 2006. ILOG CPLEX World's leading mathematical programming optimizers. http://www.ilog.com/products/cplex/.Google Scholar
- Kan, A. H. G. R., Stougie, L., and Vercellis, C. 1993. A class of generalized greedy algorithms for the multi-knapsack problemm. Discr. Appl. Math. 42, 279--290. Google Scholar
Digital Library
- Karl Nagel Corporation 2006. Sarbanes-Oxley. http://www.sarbanes-oxley.com/.Google Scholar
- Kephart, J. O. and Chess, D. M. 2003. The vision of autonomic computing. IEEE Comput. 36, 1 (Jan.), 41--50. Google Scholar
Digital Library
- Khuller, S., Kim, Y., and Wan, Y. 2003. Algorithms for data migration with cloning. In Proceedings of the 22nd ACM Conference on Principles of Database Systems. Google Scholar
Digital Library
- Kwan, T. T., McCrath, R., and Reed, D. A. 1995. NCSA's World Wide Web server: Design and performance. IEEE Comput. 28, 11, 68--74. Google Scholar
Digital Library
- Lamb, E. 2001. Hardware spending spatters. Red Herring, 32--33.Google Scholar
- Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 84--92. Google Scholar
Digital Library
- Lu, C., Alvarez, G. A., and Wilkes, J. 2002. Aqueduct: Online data migration with performance guarantees. In Proceedings of the USENIX Conference on File and Storage Technologies, 219--230. Google Scholar
Digital Library
- McDATA Corp. 2006. Storage network extension and routing. http://www.mcdata.com/products/hardware/srouter/index.html.Google Scholar
- Mehta, M. and DeWitt, D. J. 1997. Data placement in shared-nothing parallel database systems. The VLDB J. 6, 1, 53--72. Google Scholar
Digital Library
- Menon, J., Pease, D. A., Rees, R., Duyanovich, L., and Hillsberg, B. 2003. IBM storage tank---A heterogeneous scalable SAN file system. IBM Syst. J. 42, 2. Google Scholar
Digital Library
- Mesnier, M., Ganger, G. R., and Riedel, E. 2003. Object-Based storage. IEEE Commun. Mag. Google Scholar
Digital Library
- Patterson, D., Gibson, G., and Katz, R. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM SIGMOD Conference on Management of Data, 109--116. Google Scholar
Digital Library
- PostgreSQL Global Development Group. 2007. Postgresql 8.2. http://www.postgresql.org/.Google Scholar
- Qiao, L., Iyer, B. R., Agrawal, D., and Abbadi, A. E. 2006. Automated storage management with QoS guarantees. In Proceedings of the International Conference on Data Engineering. Google Scholar
Digital Library
- Sivathanu, M., Bairavasundaram, L., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2005. Database-Aware semantically-smart storage. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Stonebraker, M., Aoki, P., Devine, R., Litwin, W., and Olson, M. 1994. Mariposa: A new architecture for distributed data. In Proceedings of the 10th International Conference on Data Engineering, 54--65. Google Scholar
Digital Library
- Transaction Processing Council. 2003. Automatic storage management technical overview: An oracle white paper. Oracle Technology Network (http://www.oracle.com/technology/).Google Scholar
- Transaction Processing Performance Council (TPC). 2006. TPC benchmark C standard specification revision 5.8.0. Oracle Technology Network (http://www.oracle.com/technology/).Google Scholar
- Veritas. 2005. Storage and server automation. http://www.symantec.com/Products/enterprise.Google Scholar
- Wu, C. and Burns, R. 2005. Tunable randomization for load management in shared-disk clusters. ACM Trans. Storage 1, 1 (Feb.), 108--131. Google Scholar
Digital Library
- Zhang, G., Shu, J., Xue, W., and Zheng, W. 2007. SLAS: An efficient approach to scaling round-robin striped volumes. ACM Trans. Storage 3, 1 (Mar.). Google Scholar
Digital Library
Index Terms
Workload-based generation of administrator hints for optimizing database storage utilization
Recommendations
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Optimizing Local File Accesses for FUSE-Based Distributed Storage
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisModern distributed file systems can store huge amounts of information while retaining the benefits of high reliability and performance. Many of these systems are prototyped with FUSE, a popular framework for implementing user-level file systems. ...
Near-Optimum Storage Models for Nested Relations Based on Workload Information
The problem of choosing a storage model for a nested relation (i.e., a relation containing relations) is considered. A technique is introduced that uses the workload information of the database system under consideration to obtain a better storage model ...






Comments