skip to main content
research-article

Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

Published:01 July 2013Publication History
Skip Abstract Section

Abstract

Declustering techniques reduce query response times through parallel I/O by distributing data among parallel disks. Recently, replication-based approaches were proposed to further reduce the response time. Efficient retrieval of replicated data from multiple disks is a challenging problem. Existing retrieval techniques are designed for storage arrays with identical disks, having no initial load or network delay. In this article, we consider the generalized retrieval problem of replicated data where the disks in the system might be heterogeneous, the disks may have initial load, and the storage arrays might be located on different sites. We first formulate the generalized retrieval problem using a Linear Programming (LP) model and solve it with mixed integer programming techniques. Next, the generalized retrieval problem is formulated as a more efficient maximum flow problem. We prove that the retrieval schedule returned by the maximum flow technique yields the optimal response time and this result matches the LP solution. We also propose a low-complexity online algorithm for the generalized retrieval problem by not guaranteeing the optimality of the result. Performance of proposed and state of the art retrieval strategies are investigated using various replication schemes, query types, query loads, disk specifications, network delays, and initial loads.

References

  1. Abdel-Ghaffar, K. A. S. and El Abbadi, A. 1997. Optimal allocation of two-dimensional data. In Proceedings of ICDT. 409--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adaptec. 2010. Adaptec high-performance hybrid arrays (HPHAs). http://www.adaptec.com/nr/rdonlyres/a1c72763-e3b9-45f7-b871-a490c29a9b11/0/hpha5_fb.pdf. PMC-Sierra, Inc.Google ScholarGoogle Scholar
  3. Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. 2008. Design tradeoffs for SSD performance. In Proceedings of Usenix Annual Technical Conference (ATC). 57--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Altiparmak, N. and Tosun, A. S. 2012. Equivalent disk allocations. IEEE Trans. Parallel Distrib. Syst. 23, 3, 538--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Anderson, R. J. and Setubal, J. A. C. 1992. On the parallel implementation of Goldberg’s maximum flow algorithm. In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Atallah, M. J. and Prabhakar, S. 2000. (Almost) optimal parallel block access for range queries. In Proceedings of ACM PODS. 205--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bader, D. A. and Sachdeva, V. 2005. A cache-aware parallel implementation of the push-relabel network flow algorithm and experimental evaluation of the gap relabeling heuristic. In Proceedings of ISCA PDCS. 41--48.Google ScholarGoogle Scholar
  8. Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R* tree: An efficient and robust access method for points and rectangles. In Proceedings of ACM SIGMOD. 322--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bhatia, R., Sinha, R. K., and Chen, C. 2000. Hierarchical declustering schemes for range queries. In Proceedings of EDBT. 525--537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, C. and Cheng, C. T. 2002. From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries. In Proceedings of ACM PODS. 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, C.-M. and Cheng, C. 2003. Replication and retrieval strategies of multidimensional data on parallel disks. In Proceedings of the Conference on Information and Knowledge Management (CIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, C., Bhatia, R., and Sinha, R. 2000. Declustering using golden ratio sequences. In Proceedings of ICDE. 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, L. T. and Rotem, D. 1994. Optimal response time retrieval of replicated data. In Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 36--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CPLEX, I. IBM ilog cplex optimization studio for academics: High-performance software for mathematical programming and optimization. http://www.ilog.com/products/cplex/.Google ScholarGoogle Scholar
  15. Dantzig, G. B. and Thapa, M. N. 1997. Linear Programming 1: Introduction. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Du, H. C. and Sobolewski, J. S. 1982. Disk allocation for Cartesian product files on multiple-disk systems. ACM Trans. Datab. Syst. 7, 1, 82--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. EqualLogic. 2011. Equallogic ps6100xs hybrid storage array. http://www.equallogic.com/products/default.aspx?id=10653. Dell, Inc.Google ScholarGoogle Scholar
  18. Faloutsos, C. and Bhagwat, P. 1993. Declustering using fractals. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems. 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fan, C., Gupta, A., and Liu, J. 1994. Latin cubes and parallel array access. In Proceedings of the 8th International Parallel Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ferhatosmanoglu, H., Tosun, A. S., and Ramachandran, A. 2004. Replicated declustering of spatial data. In Proceedings of 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 125--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Frikken, K. 2005. Optimal distributed declustering using replication. In Proceedings of the 10th ICDT. 144--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Frikken, K., Atallah, M., Prabhakar, S., and Safavi-Naini, R. 2002. Optimal parallel I/O for range queries through replication. In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA). 669--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gaede, V. and Gunther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 170--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ghandeharizadeh, S. and De Witt, D. J. 1990a. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In Proceedings of VLDB. 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ghandeharizadeh, S. and De Witt, D. J. 1990b. A multiuser performance analysis of alternative declustering strategies. In Proceedings of ICDE. 466--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Goldberg, A. V. and Tarjan, R. E. 1988. A new approach to the maximum flow problem. J. ACM 35, 921--940. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of ACM SIGMOD. 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hong, B. and He, Z . 2011. An asynchronous multithreaded algorithm for the maximum network flow problem with nonblocking global relabeling heuristic. Trans. Parallel Distrib. Syst 22, 6, 1025--1033. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hua, K. A. and Young, H. C. 1997. A general multidimensional data allocation method for multicomputer database systems. In Proceedings of Database and Expert System Applications. 401--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Karp, R. M. 1972. Reducibility among combinatorial problems. Complex. Comput. Comput. 40, 4, 85--103.Google ScholarGoogle ScholarCross RefCross Ref
  31. Kavalanekar, S., Worthington, B., Zhang, Q., and Sharda, V. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 119--128.Google ScholarGoogle Scholar
  32. Kim, K. and Prasanna-Kumar, V. K. 1993. Latin squares for parallel array access. Trans. Parallel Distrib. Syst 4, 4, 361--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kim, M. H. and Pramanik, S. 1988. Optimal file distribution for partial match retrieval. In Proceedings of ACM SIGMOD. 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Koyuturk, M. and Aykanat, C. 2005. Iterative-improvement-based declustering heuristics for multi-disk databases. Inform. Syst. 30, 9, 47--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Liu, D. and Wu, M. 2001. A hypergraph based approach to declustering problems. Distrib. Parallel Datab. 10, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mehlhorn, K. and Näher, S. 1995. Leda: A platform for combinatorial and geometric computing. Comm. ACM 38, 1, 96--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mitzenmacher, M. 2001. The power of two choices in randomized load balancing. Trans. Parallel Distrib. Syst. 12, 1094--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Narayanan, D., Donnelly, A., Thereska, E., Elnikety, S., and Rowston, A. 2008. Everest: Scaling down peak loads through I/O off-loading. In Oper. Syst. Design Implement. 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., and Rowston, A. 2009. Migrating server storage SSDs: Analysis and tradeoffs. In Proceedings of EuroSystems. 145--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nimbus. 2010. Nimbus data s-class enterprise flash storage systems. http://www.nimbusdata.com/products/Nimbus_S-class_Datasheet.pdf.Google ScholarGoogle Scholar
  41. Oktay, K. Y., Turk, A., and Aykanat, C. 2009. Selective replicated declustering for arbitrary queries. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing. (Euro-Par). Springer-Verlag, Berlin, 375--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Orenstein, G. 2003. IP Storage Networking: Straight to the Core. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., and El Abbadi, A. 1998a. Cyclic allocation of two-dimensional data. In Proceedings of ICDE. 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Prabhakar, S., Agrawal, D., and El Abbadi, A. 1998b. Efficient disk allocation for fast similarity searching. In Proceedings of SPAA. 78--87. PW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ramsan. 2010. Ramsan-630 flash solid state disk. http://www.ramsan.com/files/download/212. White Paper, Texas Memory Systems.Google ScholarGoogle Scholar
  46. Samet, H. 1989. The Design and Analysis of Spatial Structures. Addison Wesley, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sanders, P., Egner, S., and Korst, K. 2000. Fast concurrent access to parallel disks. In Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shektar, S. and Liu, D. 1996. Partitioning similarity graphs: A framework for declustering problems. Inform. Syst. 21, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. SNIA. Iotta repository. http://iotta.snia.org. Storage Networking Ind. Assoc.Google ScholarGoogle Scholar
  50. Sun. 2009a. Sun storage 7000 unified storage systems family. http://www.oracle.com/us/products/servers-storage/039224.pdf.Google ScholarGoogle Scholar
  51. Sun. 2009b. Sun storage f5100 flash array. http://www.oracle.com/us/043970.pdf.Google ScholarGoogle Scholar
  52. Tosun, A. S. 2004. Replicated declustering for arbitrary queries. In Proceedings of the 19th ACM Symposium on Applied Computing. 748--753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tosun, A. S. 2005a. Constrained declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 232--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tosun, A. S. 2005b. Design theoretic approach to replicated declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Tosun, A. S. 2005c. Threshold based declustering in high dimensions. In Proceedings of the International Conference on Database and Expert Systems Applications. 818--827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tosun, A. S. 2007a. Analysis and comparison of replicated declustering schemes. Trans. Parallel Distrib. Syst. 18, 11, 1578--1591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Tosun, A. S. 2007b. Threshold-based declustering. Inform. Sci. 177, 5, 1309--1331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Tosun, A. S. 2008. Multi-site retrieval of declustered data. In Proceedings of 28th International Conference on Distributed Computing Systems (ICDCS). 486--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Tosun, A. S. and Ferhatosmanoglu, H. 2002. Optimal parallel I/O using replication. In Proceedings of International Workshops on Parallel Processing (ICPP). 506--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Violin. 2010. Violin 3200 flash memory array. http://www.violin-memory.com/assets/3200-datasheet.pdf. Violin 3200 Memory Datasheet.Google ScholarGoogle Scholar
  61. Violin. 2011. Violin 6000 flash memory array. http://www.violin-memory.com/assets/Violin_Datasheet_6000.pdf?d=1. Violin 6000 Memory Datasheet.Google ScholarGoogle Scholar
  62. XO. Dedicated Internet access overview. http://www.xo.com/services/network/dia/Pages/overview. aspx. XO Communications, LLC.Google ScholarGoogle Scholar
  63. Zebi. 2012. Zebi hybrid storage array. http://tegile.biz/wp-content/uploads/2012/01/Zebi-White-Paper-012612-Final.pdf. Tegile Systems, Inc.Google ScholarGoogle Scholar

Index Terms

  1. Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!