Abstract
Collective communication over a group of processors is an integral and time consuming component in many HPC applications. Many modern day supercomputers are based on torus interconnects. On such systems, for an irregular communicator comprising of a subset of processors, the algorithms developed so far are not contention free in general and hence non-optimal.
In this paper, we present a novel contention-free algorithm to perform collective operations over a subset of processors in a torus network. We also extend previous work on regular communicators to handle special cases of irregular communicators that occur frequently in parallel scientific applications. For the generic case where multiple node disjoint sub-communicators communicate simultaneously in a loosely synchronous fashion, we propose a novel cooperative approach to route the data for individual sub-communicators without contention. Empirical results demonstrate that our algorithms outperform the optimized MPI collective implementation on IBM's Blue Gene/P supercomputer for large data sizes and random node distributions.
- A. Faraj, S. Kumar, B. Smith, A. Mamidala, J. Gunnels, and P. Heidelberger. MPI collective communications on the Blue Gene/P supercomputer: algorithms and optimizations". In ICS, pages 489--490, 2009. Google Scholar
Digital Library
- N. Jain and Y. Sabharwal. Optimal bucket algorithms for large MPI collectives on torus interconnects. In ICS, pages 27--36, 2010. Google Scholar
Digital Library
- M. Shroff and R. A. V. D. Geijn. Collmark: Mpi collective communication benchmark. Technical report, 2000.Google Scholar
Index Terms
Collective algorithms for sub-communicators
Recommendations
Collective algorithms for sub-communicators
ICS '12: Proceedings of the 26th ACM international conference on SupercomputingCollective communication over a group of processors is an integral and time consuming component in many high performance computing applications. Many modern day super- computers are based on torus interconnects and near optimal algorithms have been ...
Collective algorithms for sub-communicators
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingCollective communication over a group of processors is an integral and time consuming component in many HPC applications. Many modern day supercomputers are based on torus interconnects. On such systems, for an irregular communicator comprising of a ...
Optimizing Collective Communication in UPC
IPDPSW '14: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium WorkshopsMessage Passing Interface (MPI) has been the defacto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (...







Comments