Abstract
The Gemini interconnect on the Cray XE6 platform provides for lightweight remote direct memory access (RDMA) between nodes, which is useful for implementing partitioned global address space (PGAS) languages like UPC and Co-Array Fortran. In this paper, we perform a study of Gemini performance using a set of communication microbenchmarks and compare the performance of one-sided communication in PGAS languages with two-sided MPI. Our results demonstrate the performance benefits of the PGAS model on Gemini hardware, showing in what circumstances and by how much one-sided communication outperforms two-sided in terms of messaging rate, aggregate bandwidth, and computation and communication overlap capability. For example, for 8-byte and 2KB messages the one-sided messaging rate is 5 and 10 times greater respectively than the twosided one. The study also reveals important information about how to optimize one-sided Gemini communication.
- S. Alam, W. Sawyer, T. Stitt, N. Stringfellow, and A. Tineo. Evaluation of productivity and performance characteristics of CCE CAF and UPC compilers. In CUG 2010, Edinburgh, Scotland, May 2010.Google Scholar
- P. Bala, T. Clark, and S. L. Ridgway. Application of pfortran and co-array fortran in the parallelization of the gromos96 molecular dynamics module. In Scientific Programming 9:61-68, January 2001. Google Scholar
Digital Library
- R. Barrett. Co-array fortran experiences with finite differencing methods. In The 48th Cray User Group meeting, Lugano, Italy, May 2006.Google Scholar
- C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In 20th International Parallel and Distributed Processing Symposium IPDPS, April 2006. Google Scholar
Digital Library
- W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. In Tech. Rep. CCS-TR-99-157 May, May 1999.Google Scholar
- C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. M. Crummey. Co-Array fortran performance and potential: An NPB experimental study. In In Proc. of the 16th Intl. Workshop on Languages and Compilers for Parallel Computing, 2003.Google Scholar
- C. Coarfa, Y. Dotsenko, and J. Mellor-Crummey. Experiences with sweep3d implementations in co-array fortran. In The Journal of Supercomputing, 36:101-121, May 2006. Google Scholar
Digital Library
- T. El-Ghazawi and F. Cantonnet. Upc performance and potential: A npb experimental study. In In Supercomputing, 2002. Google Scholar
Digital Library
- AI Geist. Sustained petascale: The next MPI challenge. In EuroPVMMPI, October 2007. Google Scholar
Digital Library
- Haoqiang Jin, Robert Hood, and Piyush Mehrota. A practical study of UPC with the NAS parallel benchmarks. In Partitioned Global Address Space Languages, Oct., 2009. Google Scholar
Digital Library
- J. Mellor-Crummey, L. Adhianto, W. N. Scherer III, and G. Jin. A new vision for coarray fortran. In Proceedings of the 3rd Conference on Partitioned Global Address Space Programming Models, PGAS '09, pages 5:1-5:9, New York, NY, USA, 2009. Google Scholar
Digital Library
- Osu micro-benchmark. http://mvapich.cse.ohio-state.edu/benchmarks/.Google Scholar
- R. Nishtala, P. Hargrove, D. Bonachea, and K. Yelick. Scaling communication-intensive applications on bluegene/p using one-sided communication and overlap. In 23rd International Parallel and Distributed Processing Symposium (IPDPS), 2009. Google Scholar
Digital Library
- NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google Scholar
- R. W. Numrich. Parallel numerical algorithms based on tensor notation and co-array fortran syntax. In Parallel Computing, 31:588-607, June 2005. Google Scholar
Digital Library
- R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. In ACM SIGPLAN Fortran Forum, vol. 17, no. 2, pp. 131, August 1998. Google Scholar
Digital Library
- R. Preissl, N. Wichmann, B. Long, J. Shalf, S. Ethier, and A. Koniges. Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In SC2011, to appear, November 2011. Google Scholar
Digital Library
- J Reid. Co-array fortran for full and sparse matrices. In Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing, PARA '02, London, 2002. Google Scholar
Digital Library
- H. Shan, F. Blagojevic, S. J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A programming model performance study using the nas parallel benchmarks. In Scientific Programming-Exploring Languages for Expressing Medium to Massive On-Chip Parallelism, Vol. 18, Issue 3-4, August 2010. Google Scholar
Digital Library
- K. D. Underwood, M. J. Levenhagen, and R. Brightwell. Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors. In SC07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing. New York, NY, USA, 2007. Google Scholar
Digital Library
- Challenges for the message passing interface in the petaflops era. www.cs.uiuc.edu/homes/wgropp/bib/talks/tdata/2007/mpifuture-uiuc.pdf.Google Scholar
Index Terms
A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI
Recommendations
A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI
PMBS '11: Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systemsThe Gemini interconnect on the Cray XE6 platform provides for lightweight remote direct memory access (RDMA) between nodes, which is useful for implementing partitioned global address space languages like UPC and Co-Array Fortran. In this paper, we ...
Evaluating the Potential of Cray Gemini Interconnect for PGAS Communication Runtime Systems
HOTI '11: Proceedings of the 2011 IEEE 19th Annual Symposium on High Performance InterconnectsThe Cray Gemini Interconnect has been recently introduced as the next generation network for building scalable multi-petascale supercomputers. The Cray XE6 systems, which use the Gemini Interconnect are becoming available with Message Passing Interface (...
Using SMB Benchmark Suite to Evaluate the Potential of UPC on the Cray XE6
MINES '13: Proceedings of the 2013 Fifth International Conference on Multimedia Information Networking and SecurityWith the exponential increase of the degree of parallelism on the High-Performance Computing (HPC) platforms, the de facto MPI programming model has started to face great challenges. Researchers start to investigate other programming models. One of the ...






Comments