skip to main content
column

A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI

Authors Info & Claims
Published:08 October 2012Publication History
Skip Abstract Section

Abstract

The Gemini interconnect on the Cray XE6 platform provides for lightweight remote direct memory access (RDMA) between nodes, which is useful for implementing partitioned global address space (PGAS) languages like UPC and Co-Array Fortran. In this paper, we perform a study of Gemini performance using a set of communication microbenchmarks and compare the performance of one-sided communication in PGAS languages with two-sided MPI. Our results demonstrate the performance benefits of the PGAS model on Gemini hardware, showing in what circumstances and by how much one-sided communication outperforms two-sided in terms of messaging rate, aggregate bandwidth, and computation and communication overlap capability. For example, for 8-byte and 2KB messages the one-sided messaging rate is 5 and 10 times greater respectively than the twosided one. The study also reveals important information about how to optimize one-sided Gemini communication.

References

  1. S. Alam, W. Sawyer, T. Stitt, N. Stringfellow, and A. Tineo. Evaluation of productivity and performance characteristics of CCE CAF and UPC compilers. In CUG 2010, Edinburgh, Scotland, May 2010.Google ScholarGoogle Scholar
  2. P. Bala, T. Clark, and S. L. Ridgway. Application of pfortran and co-array fortran in the parallelization of the gromos96 molecular dynamics module. In Scientific Programming 9:61-68, January 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Barrett. Co-array fortran experiences with finite differencing methods. In The 48th Cray User Group meeting, Lugano, Italy, May 2006.Google ScholarGoogle Scholar
  4. C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In 20th International Parallel and Distributed Processing Symposium IPDPS, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. In Tech. Rep. CCS-TR-99-157 May, May 1999.Google ScholarGoogle Scholar
  6. C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. M. Crummey. Co-Array fortran performance and potential: An NPB experimental study. In In Proc. of the 16th Intl. Workshop on Languages and Compilers for Parallel Computing, 2003.Google ScholarGoogle Scholar
  7. C. Coarfa, Y. Dotsenko, and J. Mellor-Crummey. Experiences with sweep3d implementations in co-array fortran. In The Journal of Supercomputing, 36:101-121, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. El-Ghazawi and F. Cantonnet. Upc performance and potential: A npb experimental study. In In Supercomputing, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. AI Geist. Sustained petascale: The next MPI challenge. In EuroPVMMPI, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Haoqiang Jin, Robert Hood, and Piyush Mehrota. A practical study of UPC with the NAS parallel benchmarks. In Partitioned Global Address Space Languages, Oct., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Mellor-Crummey, L. Adhianto, W. N. Scherer III, and G. Jin. A new vision for coarray fortran. In Proceedings of the 3rd Conference on Partitioned Global Address Space Programming Models, PGAS '09, pages 5:1-5:9, New York, NY, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Osu micro-benchmark. http://mvapich.cse.ohio-state.edu/benchmarks/.Google ScholarGoogle Scholar
  13. R. Nishtala, P. Hargrove, D. Bonachea, and K. Yelick. Scaling communication-intensive applications on bluegene/p using one-sided communication and overlap. In 23rd International Parallel and Distributed Processing Symposium (IPDPS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google ScholarGoogle Scholar
  15. R. W. Numrich. Parallel numerical algorithms based on tensor notation and co-array fortran syntax. In Parallel Computing, 31:588-607, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. In ACM SIGPLAN Fortran Forum, vol. 17, no. 2, pp. 131, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Preissl, N. Wichmann, B. Long, J. Shalf, S. Ethier, and A. Koniges. Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In SC2011, to appear, November 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J Reid. Co-array fortran for full and sparse matrices. In Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing, PARA '02, London, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Shan, F. Blagojevic, S. J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A programming model performance study using the nas parallel benchmarks. In Scientific Programming-Exploring Languages for Expressing Medium to Massive On-Chip Parallelism, Vol. 18, Issue 3-4, August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. D. Underwood, M. J. Levenhagen, and R. Brightwell. Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors. In SC07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing. New York, NY, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Challenges for the message passing interface in the petaflops era. www.cs.uiuc.edu/homes/wgropp/bib/talks/tdata/2007/mpifuture-uiuc.pdf.Google ScholarGoogle Scholar

Index Terms

  1. A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!