Abstract
The problem of statically assigning nonpartitioned files in a parallel I/O system has been extensively investigated. A basic workload characteristic assumption of most existing solutions to the problem is that there exists a strong inverse correlation between file access frequency and file size. In other words, the most popular files are typically small in size, while the large files are relatively unpopular. Recent studies on the characteristics of Web proxy traces suggested, however, the correlation, if any, is so weak that it can be ignored. Hence, the following two questions arise naturally. First, can existing algorithms still perform well when the workload assumption does not hold? Second, if not, can one develop a new file assignment strategy that is immune to the workload assumption? To answer these questions, we first evaluate the performance of three well-known file assignment algorithms with and without the workload assumption, respectively. Next, we develop a novel static nonpartitioned file assignment strategy for parallel I/O systems, called static round-robin (SOR), which is immune to the workload assumption. Comprehensive experimental results show that SOR consistently improves the performance in terms of mean response time over the existing schemes.
- Almeda, V., Cesario, M., Fonseca, R., Meira, W. Jr., and Murta, C. 1998. Analyzing the behaviour of a proxy server. In Proceedings of the 3rd International Caching Workshop.Google Scholar
- Alvarez, G. A., Borowsky, E., Go, S., Romer, T. H., Becker-Szendy, R., Golding, R., Merchant, A., Spasojevic, M., Veitch, A., and Wilkes, J. 2001. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst. 19, 4, 483--518. Google Scholar
Digital Library
- Anderson, E., Spence, S., Swaminathan, R., Kallahalla, M., and Wang, Q. 2005. Quickly finding near-optimal storage designs. ACM Trans. Comput. Syst. 23, 4, 337--374. Google Scholar
Digital Library
- Arlitt, M. and Jin, T. 1999. Workload characterization of the 1998 World Cup Web site. Tech. rep., HPL-1999-35R1, HP Labs.Google Scholar
- Breslau, L. Cao, P., Fan, L., Phillips, G., and Shenker, S. 1999. Web caching and Zip-like distributions: Evidence and implications. In Proceedings of the 18th Conference on Computer Communications. 126--134.Google Scholar
- Buchholz, S. and Buchholz, T. 2004. Replica placement in adaptive content distribution networks. In Proceedings of the ACM Symposium on Applied Computing. 1705--1710. Google Scholar
Digital Library
- Carrera, E. V., Pinheiro, E., and Bianchini, R. 2003. Conserving disk energy in network servers. In Proceedings of the 17th Annual International Conference on Supercomputing. 86--97. Google Scholar
Digital Library
- Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2, 145--185. Google Scholar
Digital Library
- Chu, W. Optimal file allocation in a multiple computer system. IEEE Trans. Comput. 18, 10, 885--889. Google Scholar
Digital Library
- Cunha, C., Bestavros, A., and Crovella, M. 1995. Characteristics of WWW client-based traces. Tech. rep., 1995-010, Boston University. Google Scholar
Digital Library
- Dowdy, W. and Foster, D. 1982. Comparative models of the file assignment problem. ACM Comput. Surv. 14, 2, 287--313. Google Scholar
Digital Library
- Foster, I. 2004. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, Los Altos, CA. Google Scholar
Digital Library
- Glassman, S. 1994. A caching relay for the World Wide Web. In Proceedings of the 1st International Conference on the World-Wide Web. 165--173. Google Scholar
Digital Library
- Graham, R. L. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 7, 2, 416--429.Google Scholar
Cross Ref
- Hsu, W. W., Smith, A. J., and Young, H. C. 2005. The automatic improvement of locality in storage systems. ACM Trans. Comput. Syst. 23, 4, 424--473. Google Scholar
Digital Library
- Huang, H., Hung, W., and Shin, K. G. 2005. FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the 12th ACM Symposium on Operating Systems Principles. 263--276. Google Scholar
Digital Library
- Kangasharju, J., Roberts, J., and Ross, K. 2002. Object replication strategies in content distribution networks. Comput. Comm. 25, 4, 367--383. Google Scholar
Digital Library
- Karlsson, M. and Karamanolis, C. 2004. Choosing replica placement heuristics for wide-area systems. In Proceedings of the 24th International Conference on Distributed Computing Systems. 350--359. Google Scholar
Digital Library
- Kwan, T., McGrath, R., and Reed, D. 1995. Ncsas world wide web server design and performance. Comput. 28, 11, 67--74. Google Scholar
Digital Library
- Lee, L. W., Scheuermann, P., and Vingralek, R. 2000. File assignment in parallel I/O systems with minimal variance of service time. IEEE Trans. Comput. 49, 2, 127--140. Google Scholar
Digital Library
- Loukopoulos, T. and Ahmad, I. 2000. Static and adaptive data replication algorithms for fast information access in large distributed systems. In Proceedings of the 20th International Conference on Distributed Computing Systems. 385--392. Google Scholar
Digital Library
- Loukopoulos, T., Lampsas, P., and Ahmad, I. 2005. Continuous replica placement schemes in distributed systems. In Proceedings of the 19th Annual International Conference on Supercomputing. 284--292. Google Scholar
Digital Library
- Merialdo, P., Atzeni, P., and Mecca, G. 2003. Design and development of data-intensive web sites: The Araneus approach. ACM Trans. Inter. Tech. 3, 1, 49--92. Google Scholar
Digital Library
- Nishikawa, N., Hosokawa, T., Mori, Y., Yoshida, K., and Tsuji, H. 1998. Memory-based architecture for distributed WWW caching proxy. In Proceedings of the 7th International Conference on World Wide Web. 205--214. Google Scholar
Digital Library
- Pattipati, K. R. and Wolf, J. L. 1990. A file assignment problem model for extended local area network environments. In Proceedings of the 10th International Conference on Distributed Computing Systems. 554--561.Google Scholar
- Qiu, L., Padmanabhan, V. N., and Voelker, G. M. 2001. On the placement of web server replicas. In Proceedings of the 21th Annual Joint Conference on Computer and Communications. 1587--1596.Google Scholar
- Scheuermann, P., Weikum, G., and Zabback, P. 1998. Data partitioning and load balancing in parallel disk systems. VLDB 7, 1, 48--66. Google Scholar
Digital Library
- Tang, X. and Xu, J. 2004. On replica placement for QoS-aware content distribution. In Proceedings of the 23rd Annual Joint Conference on Computer and Communications. 806--815.Google Scholar
- Tewari, R. 1992. Distributed file allocation with consistency constraints. In Proceedings of the 12th International Conference on Distributed Computing Systems. 408--415.Google Scholar
Cross Ref
- Triantafillou, P., Christodoulakis, S., and Georgiadis, C. 2000. Optimal data placement on disks: A comprehensive solution for different technologies. IEEE Trans. Knowl. Data Engin. 12, 2, 324--330. Google Scholar
Digital Library
- Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the ACM/IEEE Conference on Supercomputing. 122. Google Scholar
Digital Library
- Wolfson, O., Jajodia, S., and Huang, Y. 1997. An adaptive data replication algorithm. ACM Trans. Datab. Syst. 22, 4, 255--314. Google Scholar
Digital Library
- Xie, T. 2007. SOR: A static file assignment strategy immune to workload characteristic assumptions in parallel I/O systems. In Proceedings of the 36th International Conference on Parallel Processing (ICPP). Google Scholar
Digital Library
- Xie, T. 2008. SEA: A striping-based energy-aware strategy for data placement in RAID-structured storage systems. IEEE Trans. Comput. 57, 6, 748--761. Google Scholar
Digital Library
Index Terms
A file assignment strategy independent of workload characteristic assumptions
Recommendations
A prediction-based dynamic file assignment strategy for parallel file systems
An analysis model of file assignment and access are generalized.A load prediction model in parallel file systems is proposed.A prediction-based dynamic file assignment strategy (PDFA) is proposed.We evaluate the effectiveness of the proposed algorithms. ...
I/O Optimizations Based on Workload Characteristics for Parallel File Systems
Network and Parallel ComputingAbstractParallel file systems usually provide a unified storage solution, which fails to meet specific application needs. In this paper, we propose an extended file handle scheme to address this problem. It allows the file systems to specify optimizations ...
Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System
GRID '12: Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid ComputingCloud storage systems commonly use replication of stored data sets to ensure high reliability and availability. However, the high storage overhead of replication becomes increasingly unacceptable with the explosive growth of data stored in cloud. Some ...






Comments