Abstract
Many scientific applications use parallel I/O to meet the low latency and high bandwidth I/O requirement. Among many available parallel I/O operations, collective I/O is one of the most popular methods when the storage layouts and access patterns of data do not match. The implementation of collective I/O typically involves disk I/O operations followed by interprocessor communications. Also, in many I/O-intensive applications, parallel I/O operations are usually followed by parallel computations. This paper presents a comparative study of different overlap strategies in parallel applications. We have experimented with four different overlap strategies 1) Overlapping I/O and communication; 2) Overlapping I/O and computation; 3) Overlapping computation and communication; and 4) Overlapping I/O, communication, and computation. All experiments have been conducted on a Linux Cluster and the performance results obtained are very encouraging. On an average, we have enhanced the performance of a generic collective read call by 38%, the MxM benchmark by 26%, and the FFT benchmark by 34%.
- Caglar, Benson, Huang, and Chu. Usfmpi: A multi-threaded implementation of mpi for linux clusters. In Proc's of the 15th Inter. Conf. on Paral. and Dist. Comp. and Sys., pages 92--103, 2003.Google Scholar
- Carns, Ligon, Ross, and Thakur. Pvfs: A parallel file system for linux clusters. In Proc's of the 4th Annual Linux Showcase and Conf., pages 317--327, 2000. Google Scholar
Digital Library
- Caron, Desprez, and Suter. Overlapping computations and communications with i/o in wavefront algorithms. Technical Report RR-5410, Institut National de Recherche en Informatique et en Automatique (INRIA), 2004.Google Scholar
- Choudhary, Bordawekar, More, and Sivaram. Passion runtime library for the intel paragon. In Proc's of the Intel Supercomputer User's Group Conf., pages 119--128, 1995.Google Scholar
- Danalis, Kim, Pollock, and Swany. Transformations to parallel codes for communication-computation overlap. In SC '05: Proc's of the 2005 ACM/IEEE conf. on Supercomputing, page 58, 2005. Google Scholar
Digital Library
- Dickens and Thakur. Improving collective i/o performance using threads. 13th Inter. and 10th Symp. on Paral. and Dist. Processing, pages 38--45, 1999. Google Scholar
Digital Library
- Dickens and Thakur. Evaluation of collective i/o implementations on parallel architectures. J. Paral. Dist. Comp., 61(8):1052--1076, 2001. Google Scholar
Digital Library
- Fernandez, Frachtenberg, and Petrini. Bcs-mpi: A new approach in the system software design for large-scale parallel computers. In SC'03, page 57, 2003. Google Scholar
Digital Library
- Foster, Kesselman, and Tuecke. The nexus task-parallel runtime system. In Proc's of the 1st Inter. Workshop on Paral. Proc., 1994.Google Scholar
- Gropp and Thakur. Issues in developing a thread-safe mpi implementation. In Proc's of the 13th European PVM/MPI Users' Group Meeting, volume 4192, pages 12--21, 2006. Google Scholar
Digital Library
- Hoefler, Squyres, Rehm, and Lumsdaine. A case for non-blocking collective operations. In Book Frontiers of High Perf. Comp. and Networking ISPA Workshops, volume 4331, pages 155--164, 2006. Google Scholar
Digital Library
- Hsu and Smith. The performance impact of i/o optimizations and disk improvements. IBM J. Res. Dev., 48(2):255--289, 2004. Google Scholar
Digital Library
- Kandemir and Choudhary. Compiler-directed i/o optimization. In IPDPS '02: Proc's of the 16th Inter. Symp. on Paral. and Dist. Proc., page 19.2, 2002. Google Scholar
Digital Library
- Kandemir, Choudhary, and Ramanujam. An i/o-conscious tiling strategy for disk-resident data sets. J. Super., 21(3):257--284, 2002. Google Scholar
Digital Library
- Kotz. Disk-directed i/o for an out-of-core computation. In HPDC '95: Proc's of the 4th IEEE Inter. Symp. on High Perf. Dist. Comp., page 159, 1995. Google Scholar
Digital Library
- Krempel. Tracing the connections between mpi-io calls and their corresponding pvfs2 disk operations. Bachelor's thesis, Ruprecht-Karls Universitt Heidelberg, 2006.Google Scholar
- Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface. 1997.Google Scholar
- More, Choudhary, Foster, and Xu. Mtio - a multi-threaded parallel i/o system. In IPPS '97, pages 368--373, 1997. Google Scholar
Digital Library
- Patrick, Son, and Kandemir. Enhancing the performance of mpi-io applications by overlapping i/o, computation and communication. In PPoPP '08, 2008. Google Scholar
Digital Library
- Ross, Thakur, and Choudhary. Achievements and challenges for i/o in computational science. J. of Physics: Conf. Series, 16:501--509, 2005.Google Scholar
Cross Ref
- Sancho, Barker, Kerbyson, and Davis. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In SC '06, page 125, 2006. Google Scholar
Digital Library
- Seamons, Chen, Jones, Jozwiak, and Winslett. Server-directed collective i/o in panda. In Supercomputing '95, page 57, 1995. Google Scholar
Digital Library
- Sur, Jin, Chai, and Panda. Rdma read based rendezvous protocol for mpi over infiniband: design alternatives and benefits. In PPoPP '06: Proc's of the 11th ACM SIGPLAN symp. on Principles and practice of paral. prog., pages 32--39, 2006. Google Scholar
Digital Library
- Thakur, Gropp, and Lusk. Data sieving and collective i/o in romio. In FRONTIERS '99, page 182, 1999. Google Scholar
Digital Library
- Thakur, Gropp, and Lusk. On implementing mpi-io portably and with high performance. In Proc's of the 6th workshop on I/O in paral. and dist. sys., pages 23--32, 1999. Google Scholar
Digital Library
- Thakur, Gropp, and Lusk. Optimizing noncontiguous accesses in mpi-io. Paral. Comp., 28(1):83--105, 2002. Google Scholar
Digital Library
- Thakur, Lusk, and Gropp. I/o in parallel applications: The weakest link. The Inter. J. of High Perf. Comp. Appls., 12(4):389--395, 1998.Google Scholar
Digital Library
- Thakur, Lusk, and Gropp. Users guide for romio: A high-performance, portable mpi-io implementation, 2002.Google Scholar
- Tsujita. Effective nonblocking mpi-i/o in remote i/o operations using a multithreaded mechanism. Technical report, 2004Google Scholar
Index Terms
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO
Recommendations
Effective communication and computation overlap with hybrid MPI/SMPSs
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingCommunication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication ...
Effective communication and computation overlap with hybrid MPI/SMPSs
PPoPP '10Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication ...
LogGP Performance Evaluation of MPI
HPDC '98: Proceedings of the 7th IEEE International Symposium on High Performance Distributed ComputingUsers of parallel machines need good performance evaluations for several communication patterns in order to develop efficient message-passing applications. LogGP is a simple parallel machine model that reflects the important parameters required to ...






Comments