Abstract
Declustering techniques reduce query response times through parallel I/O by distributing data among parallel disks. Recently, replication-based approaches were proposed to further reduce the response time. Efficient retrieval of replicated data from multiple disks is a challenging problem. Existing retrieval techniques are designed for storage arrays with identical disks, having no initial load or network delay. In this article, we consider the generalized retrieval problem of replicated data where the disks in the system might be heterogeneous, the disks may have initial load, and the storage arrays might be located on different sites. We first formulate the generalized retrieval problem using a Linear Programming (LP) model and solve it with mixed integer programming techniques. Next, the generalized retrieval problem is formulated as a more efficient maximum flow problem. We prove that the retrieval schedule returned by the maximum flow technique yields the optimal response time and this result matches the LP solution. We also propose a low-complexity online algorithm for the generalized retrieval problem by not guaranteeing the optimality of the result. Performance of proposed and state of the art retrieval strategies are investigated using various replication schemes, query types, query loads, disk specifications, network delays, and initial loads.
- Abdel-Ghaffar, K. A. S. and El Abbadi, A. 1997. Optimal allocation of two-dimensional data. In Proceedings of ICDT. 409--418. Google Scholar
Digital Library
- Adaptec. 2010. Adaptec high-performance hybrid arrays (HPHAs). http://www.adaptec.com/nr/rdonlyres/a1c72763-e3b9-45f7-b871-a490c29a9b11/0/hpha5_fb.pdf. PMC-Sierra, Inc.Google Scholar
- Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. 2008. Design tradeoffs for SSD performance. In Proceedings of Usenix Annual Technical Conference (ATC). 57--70. Google Scholar
Digital Library
- Altiparmak, N. and Tosun, A. S. 2012. Equivalent disk allocations. IEEE Trans. Parallel Distrib. Syst. 23, 3, 538--546. Google Scholar
Digital Library
- Anderson, R. J. and Setubal, J. A. C. 1992. On the parallel implementation of Goldberg’s maximum flow algorithm. In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 168--177. Google Scholar
Digital Library
- Atallah, M. J. and Prabhakar, S. 2000. (Almost) optimal parallel block access for range queries. In Proceedings of ACM PODS. 205--215. Google Scholar
Digital Library
- Bader, D. A. and Sachdeva, V. 2005. A cache-aware parallel implementation of the push-relabel network flow algorithm and experimental evaluation of the gap relabeling heuristic. In Proceedings of ISCA PDCS. 41--48.Google Scholar
- Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R* tree: An efficient and robust access method for points and rectangles. In Proceedings of ACM SIGMOD. 322--331. Google Scholar
Digital Library
- Bhatia, R., Sinha, R. K., and Chen, C. 2000. Hierarchical declustering schemes for range queries. In Proceedings of EDBT. 525--537. Google Scholar
Digital Library
- Chen, C. and Cheng, C. T. 2002. From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries. In Proceedings of ACM PODS. 29--38. Google Scholar
Digital Library
- Chen, C.-M. and Cheng, C. 2003. Replication and retrieval strategies of multidimensional data on parallel disks. In Proceedings of the Conference on Information and Knowledge Management (CIKM). Google Scholar
Digital Library
- Chen, C., Bhatia, R., and Sinha, R. 2000. Declustering using golden ratio sequences. In Proceedings of ICDE. 271--280. Google Scholar
Digital Library
- Chen, L. T. and Rotem, D. 1994. Optimal response time retrieval of replicated data. In Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 36--44. Google Scholar
Digital Library
- CPLEX, I. IBM ilog cplex optimization studio for academics: High-performance software for mathematical programming and optimization. http://www.ilog.com/products/cplex/.Google Scholar
- Dantzig, G. B. and Thapa, M. N. 1997. Linear Programming 1: Introduction. Springer-Verlag. Google Scholar
Digital Library
- Du, H. C. and Sobolewski, J. S. 1982. Disk allocation for Cartesian product files on multiple-disk systems. ACM Trans. Datab. Syst. 7, 1, 82--101. Google Scholar
Digital Library
- EqualLogic. 2011. Equallogic ps6100xs hybrid storage array. http://www.equallogic.com/products/default.aspx?id=10653. Dell, Inc.Google Scholar
- Faloutsos, C. and Bhagwat, P. 1993. Declustering using fractals. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems. 18--25. Google Scholar
Digital Library
- Fan, C., Gupta, A., and Liu, J. 1994. Latin cubes and parallel array access. In Proceedings of the 8th International Parallel Processing Symposium. Google Scholar
Digital Library
- Ferhatosmanoglu, H., Tosun, A. S., and Ramachandran, A. 2004. Replicated declustering of spatial data. In Proceedings of 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 125--135. Google Scholar
Digital Library
- Frikken, K. 2005. Optimal distributed declustering using replication. In Proceedings of the 10th ICDT. 144--157. Google Scholar
Digital Library
- Frikken, K., Atallah, M., Prabhakar, S., and Safavi-Naini, R. 2002. Optimal parallel I/O for range queries through replication. In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA). 669--678. Google Scholar
Digital Library
- Gaede, V. and Gunther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 170--231. Google Scholar
Digital Library
- Ghandeharizadeh, S. and De Witt, D. J. 1990a. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In Proceedings of VLDB. 481--492. Google Scholar
Digital Library
- Ghandeharizadeh, S. and De Witt, D. J. 1990b. A multiuser performance analysis of alternative declustering strategies. In Proceedings of ICDE. 466--475. Google Scholar
Digital Library
- Goldberg, A. V. and Tarjan, R. E. 1988. A new approach to the maximum flow problem. J. ACM 35, 921--940. Google Scholar
Digital Library
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of ACM SIGMOD. 47--57. Google Scholar
Digital Library
- Hong, B. and He, Z . 2011. An asynchronous multithreaded algorithm for the maximum network flow problem with nonblocking global relabeling heuristic. Trans. Parallel Distrib. Syst 22, 6, 1025--1033. Google Scholar
Digital Library
- Hua, K. A. and Young, H. C. 1997. A general multidimensional data allocation method for multicomputer database systems. In Proceedings of Database and Expert System Applications. 401--409. Google Scholar
Digital Library
- Karp, R. M. 1972. Reducibility among combinatorial problems. Complex. Comput. Comput. 40, 4, 85--103.Google Scholar
Cross Ref
- Kavalanekar, S., Worthington, B., Zhang, Q., and Sharda, V. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 119--128.Google Scholar
- Kim, K. and Prasanna-Kumar, V. K. 1993. Latin squares for parallel array access. Trans. Parallel Distrib. Syst 4, 4, 361--370. Google Scholar
Digital Library
- Kim, M. H. and Pramanik, S. 1988. Optimal file distribution for partial match retrieval. In Proceedings of ACM SIGMOD. 173--182. Google Scholar
Digital Library
- Koyuturk, M. and Aykanat, C. 2005. Iterative-improvement-based declustering heuristics for multi-disk databases. Inform. Syst. 30, 9, 47--70. Google Scholar
Digital Library
- Liu, D. and Wu, M. 2001. A hypergraph based approach to declustering problems. Distrib. Parallel Datab. 10, 3. Google Scholar
Digital Library
- Mehlhorn, K. and Näher, S. 1995. Leda: A platform for combinatorial and geometric computing. Comm. ACM 38, 1, 96--102. Google Scholar
Digital Library
- Mitzenmacher, M. 2001. The power of two choices in randomized load balancing. Trans. Parallel Distrib. Syst. 12, 1094--1104. Google Scholar
Digital Library
- Narayanan, D., Donnelly, A., Thereska, E., Elnikety, S., and Rowston, A. 2008. Everest: Scaling down peak loads through I/O off-loading. In Oper. Syst. Design Implement. 15--28. Google Scholar
Digital Library
- Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., and Rowston, A. 2009. Migrating server storage SSDs: Analysis and tradeoffs. In Proceedings of EuroSystems. 145--158. Google Scholar
Digital Library
- Nimbus. 2010. Nimbus data s-class enterprise flash storage systems. http://www.nimbusdata.com/products/Nimbus_S-class_Datasheet.pdf.Google Scholar
- Oktay, K. Y., Turk, A., and Aykanat, C. 2009. Selective replicated declustering for arbitrary queries. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing. (Euro-Par). Springer-Verlag, Berlin, 375--386. Google Scholar
Digital Library
- Orenstein, G. 2003. IP Storage Networking: Straight to the Core. Addison-Wesley. Google Scholar
Digital Library
- Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., and El Abbadi, A. 1998a. Cyclic allocation of two-dimensional data. In Proceedings of ICDE. 94--101. Google Scholar
Digital Library
- Prabhakar, S., Agrawal, D., and El Abbadi, A. 1998b. Efficient disk allocation for fast similarity searching. In Proceedings of SPAA. 78--87. PW. Google Scholar
Digital Library
- Ramsan. 2010. Ramsan-630 flash solid state disk. http://www.ramsan.com/files/download/212. White Paper, Texas Memory Systems.Google Scholar
- Samet, H. 1989. The Design and Analysis of Spatial Structures. Addison Wesley, MA. Google Scholar
Digital Library
- Sanders, P., Egner, S., and Korst, K. 2000. Fast concurrent access to parallel disks. In Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. Google Scholar
Digital Library
- Shektar, S. and Liu, D. 1996. Partitioning similarity graphs: A framework for declustering problems. Inform. Syst. 21, 4. Google Scholar
Digital Library
- SNIA. Iotta repository. http://iotta.snia.org. Storage Networking Ind. Assoc.Google Scholar
- Sun. 2009a. Sun storage 7000 unified storage systems family. http://www.oracle.com/us/products/servers-storage/039224.pdf.Google Scholar
- Sun. 2009b. Sun storage f5100 flash array. http://www.oracle.com/us/043970.pdf.Google Scholar
- Tosun, A. S. 2004. Replicated declustering for arbitrary queries. In Proceedings of the 19th ACM Symposium on Applied Computing. 748--753. Google Scholar
Digital Library
- Tosun, A. S. 2005a. Constrained declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 232--237. Google Scholar
Digital Library
- Tosun, A. S. 2005b. Design theoretic approach to replicated declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 226--231. Google Scholar
Digital Library
- Tosun, A. S. 2005c. Threshold based declustering in high dimensions. In Proceedings of the International Conference on Database and Expert Systems Applications. 818--827. Google Scholar
Digital Library
- Tosun, A. S. 2007a. Analysis and comparison of replicated declustering schemes. Trans. Parallel Distrib. Syst. 18, 11, 1578--1591. Google Scholar
Digital Library
- Tosun, A. S. 2007b. Threshold-based declustering. Inform. Sci. 177, 5, 1309--1331. Google Scholar
Digital Library
- Tosun, A. S. 2008. Multi-site retrieval of declustered data. In Proceedings of 28th International Conference on Distributed Computing Systems (ICDCS). 486--493. Google Scholar
Digital Library
- Tosun, A. S. and Ferhatosmanoglu, H. 2002. Optimal parallel I/O using replication. In Proceedings of International Workshops on Parallel Processing (ICPP). 506--513. Google Scholar
Digital Library
- Violin. 2010. Violin 3200 flash memory array. http://www.violin-memory.com/assets/3200-datasheet.pdf. Violin 3200 Memory Datasheet.Google Scholar
- Violin. 2011. Violin 6000 flash memory array. http://www.violin-memory.com/assets/Violin_Datasheet_6000.pdf?d=1. Violin 6000 Memory Datasheet.Google Scholar
- XO. Dedicated Internet access overview. http://www.xo.com/services/network/dia/Pages/overview. aspx. XO Communications, LLC.Google Scholar
- Zebi. 2012. Zebi hybrid storage array. http://tegile.biz/wp-content/uploads/2012/01/Zebi-White-Paper-012612-Final.pdf. Tegile Systems, Inc.Google Scholar
Index Terms
Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays
Recommendations
Continuous Retrieval of Replicated Data from Heterogeneous Storage Arrays
MASCOTS '14: Proceedings of the 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication SystemsReplicated declustering techniques reduce response times of disk requests by distributing data among multiple disks and retrieving in parallel. Efficient retrieval of replicated data from multiple disks is a challenging problem, especially for ...
Integrated Maximum Flow Algorithm for Optimal Response Time Retrieval of Replicated Data
ICPP '12: Proceedings of the 2012 41st International Conference on Parallel ProcessingEfficient retrieval of replicated data from multiple disks is a challenging problem. Traditional retrieval techniques assume that replication is done at a single site using homogeneous disk arrays having no initial load or network delay. Recently, ...
Query-Log Aware Replicated Declustering
Data declustering and replication can be used to reduce I/O times related with processing of data intensive queries. Declustering parallelizes the query retrieval process by distributing the data items requested by queries among several disks. ...






Comments