skip to main content
research-article

A file assignment strategy independent of workload characteristic assumptions

Published:30 November 2009Publication History
Skip Abstract Section

Abstract

The problem of statically assigning nonpartitioned files in a parallel I/O system has been extensively investigated. A basic workload characteristic assumption of most existing solutions to the problem is that there exists a strong inverse correlation between file access frequency and file size. In other words, the most popular files are typically small in size, while the large files are relatively unpopular. Recent studies on the characteristics of Web proxy traces suggested, however, the correlation, if any, is so weak that it can be ignored. Hence, the following two questions arise naturally. First, can existing algorithms still perform well when the workload assumption does not hold? Second, if not, can one develop a new file assignment strategy that is immune to the workload assumption? To answer these questions, we first evaluate the performance of three well-known file assignment algorithms with and without the workload assumption, respectively. Next, we develop a novel static nonpartitioned file assignment strategy for parallel I/O systems, called static round-robin (SOR), which is immune to the workload assumption. Comprehensive experimental results show that SOR consistently improves the performance in terms of mean response time over the existing schemes.

References

  1. Almeda, V., Cesario, M., Fonseca, R., Meira, W. Jr., and Murta, C. 1998. Analyzing the behaviour of a proxy server. In Proceedings of the 3rd International Caching Workshop.Google ScholarGoogle Scholar
  2. Alvarez, G. A., Borowsky, E., Go, S., Romer, T. H., Becker-Szendy, R., Golding, R., Merchant, A., Spasojevic, M., Veitch, A., and Wilkes, J. 2001. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst. 19, 4, 483--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anderson, E., Spence, S., Swaminathan, R., Kallahalla, M., and Wang, Q. 2005. Quickly finding near-optimal storage designs. ACM Trans. Comput. Syst. 23, 4, 337--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arlitt, M. and Jin, T. 1999. Workload characterization of the 1998 World Cup Web site. Tech. rep., HPL-1999-35R1, HP Labs.Google ScholarGoogle Scholar
  5. Breslau, L. Cao, P., Fan, L., Phillips, G., and Shenker, S. 1999. Web caching and Zip-like distributions: Evidence and implications. In Proceedings of the 18th Conference on Computer Communications. 126--134.Google ScholarGoogle Scholar
  6. Buchholz, S. and Buchholz, T. 2004. Replica placement in adaptive content distribution networks. In Proceedings of the ACM Symposium on Applied Computing. 1705--1710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carrera, E. V., Pinheiro, E., and Bianchini, R. 2003. Conserving disk energy in network servers. In Proceedings of the 17th Annual International Conference on Supercomputing. 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2, 145--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chu, W. Optimal file allocation in a multiple computer system. IEEE Trans. Comput. 18, 10, 885--889. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cunha, C., Bestavros, A., and Crovella, M. 1995. Characteristics of WWW client-based traces. Tech. rep., 1995-010, Boston University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dowdy, W. and Foster, D. 1982. Comparative models of the file assignment problem. ACM Comput. Surv. 14, 2, 287--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Foster, I. 2004. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, Los Altos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Glassman, S. 1994. A caching relay for the World Wide Web. In Proceedings of the 1st International Conference on the World-Wide Web. 165--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Graham, R. L. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 7, 2, 416--429.Google ScholarGoogle ScholarCross RefCross Ref
  15. Hsu, W. W., Smith, A. J., and Young, H. C. 2005. The automatic improvement of locality in storage systems. ACM Trans. Comput. Syst. 23, 4, 424--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huang, H., Hung, W., and Shin, K. G. 2005. FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the 12th ACM Symposium on Operating Systems Principles. 263--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kangasharju, J., Roberts, J., and Ross, K. 2002. Object replication strategies in content distribution networks. Comput. Comm. 25, 4, 367--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karlsson, M. and Karamanolis, C. 2004. Choosing replica placement heuristics for wide-area systems. In Proceedings of the 24th International Conference on Distributed Computing Systems. 350--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kwan, T., McGrath, R., and Reed, D. 1995. Ncsas world wide web server design and performance. Comput. 28, 11, 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lee, L. W., Scheuermann, P., and Vingralek, R. 2000. File assignment in parallel I/O systems with minimal variance of service time. IEEE Trans. Comput. 49, 2, 127--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Loukopoulos, T. and Ahmad, I. 2000. Static and adaptive data replication algorithms for fast information access in large distributed systems. In Proceedings of the 20th International Conference on Distributed Computing Systems. 385--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Loukopoulos, T., Lampsas, P., and Ahmad, I. 2005. Continuous replica placement schemes in distributed systems. In Proceedings of the 19th Annual International Conference on Supercomputing. 284--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Merialdo, P., Atzeni, P., and Mecca, G. 2003. Design and development of data-intensive web sites: The Araneus approach. ACM Trans. Inter. Tech. 3, 1, 49--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nishikawa, N., Hosokawa, T., Mori, Y., Yoshida, K., and Tsuji, H. 1998. Memory-based architecture for distributed WWW caching proxy. In Proceedings of the 7th International Conference on World Wide Web. 205--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pattipati, K. R. and Wolf, J. L. 1990. A file assignment problem model for extended local area network environments. In Proceedings of the 10th International Conference on Distributed Computing Systems. 554--561.Google ScholarGoogle Scholar
  26. Qiu, L., Padmanabhan, V. N., and Voelker, G. M. 2001. On the placement of web server replicas. In Proceedings of the 21th Annual Joint Conference on Computer and Communications. 1587--1596.Google ScholarGoogle Scholar
  27. Scheuermann, P., Weikum, G., and Zabback, P. 1998. Data partitioning and load balancing in parallel disk systems. VLDB 7, 1, 48--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tang, X. and Xu, J. 2004. On replica placement for QoS-aware content distribution. In Proceedings of the 23rd Annual Joint Conference on Computer and Communications. 806--815.Google ScholarGoogle Scholar
  29. Tewari, R. 1992. Distributed file allocation with consistency constraints. In Proceedings of the 12th International Conference on Distributed Computing Systems. 408--415.Google ScholarGoogle ScholarCross RefCross Ref
  30. Triantafillou, P., Christodoulakis, S., and Georgiadis, C. 2000. Optimal data placement on disks: A comprehensive solution for different technologies. IEEE Trans. Knowl. Data Engin. 12, 2, 324--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the ACM/IEEE Conference on Supercomputing. 122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wolfson, O., Jajodia, S., and Huang, Y. 1997. An adaptive data replication algorithm. ACM Trans. Datab. Syst. 22, 4, 255--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xie, T. 2007. SOR: A static file assignment strategy immune to workload characteristic assumptions in parallel I/O systems. In Proceedings of the 36th International Conference on Parallel Processing (ICPP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xie, T. 2008. SEA: A striping-based energy-aware strategy for data placement in RAID-structured storage systems. IEEE Trans. Comput. 57, 6, 748--761. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A file assignment strategy independent of workload characteristic assumptions

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!