Abstract
The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication.
This paper introduces an expressive subset of X10, Flat X10, designed to permit efficient execution across multiple single-threaded places with a simple runtime and without compromising on the productivity of X10. We present the design, implementation and evaluation of a compiler and runtime system for Flat X10. The Flat X10 compiler translates programs into C++ SPMD programs communicating using an active messaging infrastructure. It uses novel techniques to transform explicitly parallel programs into SPMD programs. The runtime system is based on IBM's LAPI (Low-level API) and is easily portable to other libraries such as GASNet and ARMCI.
Our implementation realizes performance comparable to hand-written MPI programs for well-known HPC benchmarks such as Random Access, Stream, and FFT, on a Federation-based cluster of Power5 SMPs (with hundreds of processors) and the Blue Gene (with thousands of processors). Submissions based on the work presented in this paper were co-winners of the 2007 and 2008 HPC Challenge Type II Awards.
- Saman P. Amarasinghe and Monica S. Lam. Communication Optimization and Code Generation for Distributed Memory Machines. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, pages 126--138. ACM, 1993. Google Scholar
Digital Library
- Christopher Barton, CĆlin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, and José Nelson Amaral. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 108--117. ACM, 2006. Google Scholar
Digital Library
- Stephen M. Blackburn, Richard L. Hudson, Ron Morrison, David S. Munro, and John Zigman. Starting with termination: A methodology for building distributed garbage collection algorithms. Aust. Comput. Sci. Commun, 23:2001, 2001. Google Scholar
Digital Library
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216. ACM, 1995. Google Scholar
Digital Library
- UPC Consortium. UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.Google Scholar
Cross Ref
- Ron Cytron, Jim Lipkis, and Edith Schonberg. A Compiler-Assisted Approach to SPMD Execution. In Proceedings of the ACM/IEEE conference on Supercomputing, pages 398--406. IEEE Computer Society, 1990. Google Scholar
Digital Library
- F. Darema-Rogers, D. A. George, V.A. Norton, and G.F. Pfister. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN. Parallel Computing, 7:11--24, 1988.Google Scholar
Cross Ref
- F. Darema-Rogers, V. A. Norton, and G. F. Pfister. Using A Single-Program-Multiple-Data Computational Model for Parallel Execution of Scientific Applications. Technical Report RC 11552, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1985.Google Scholar
- Jeffrey Dean, David Grove, and Craig Chambers. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In Proceedings of the European Conference on Object-Oriented Programming, pages 77--101. Springer-Verlag, 1995. Google Scholar
Digital Library
- V. Saraswat et al. HPC challenge 07: X10, 2007.Google Scholar
- R. Garg and Y. Sabharwal. MPI and Communication -- Software Routing and Aggregation of Messages to Optimize the Performance of HPCC RandomAccess Benchmark. In SuperComputing, Nov 2006. Google Scholar
Digital Library
- Paul N. Hilfinger, Dan Bonachea, David Gay, Susan Graham, Ben Liblit, Geoff Pike, and Katherine Yelick. Titanium Language Reference Manual. Technical report, University of California at Berkeley, 2001. Google Scholar
Digital Library
- IBM International Technical Support Organization Poughkeepsie Center. Overview of LAPI. www.redbooks.ibm.com/redbooks/pdfs/sg242080.pdf, 2008.Google Scholar
- Eric Mohr, David A. Kranz, and Jr. Robert H. Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, pages 185--197. ACM, 1990. Google Scholar
Digital Library
- R. Numrich and J. Reid. Co-array fortran for parallel programming, 1998.Google Scholar
- E. M. Paalvast, L. C. Breebart, and H. J. Sips. An expressive annotation model for generating SPMD programs. In Scalable High Performance Computing Conference, pages 208--211. IEEE Computer Society, 1992.Google Scholar
Cross Ref
- Edwin M. Paalvast, Arjan J. van Gemund, and Henk J. Sips. A method for parallel program generation with an application to the Booster language. SIGARCH Comput. Archit. News, 18(3b):457--469, 1990. Google Scholar
Digital Library
- Vijay A. Saraswat. X10 Language Report. Technical report, IBM Research, 2004.Google Scholar
- Chau-Wen Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 144--155. ACM, 1995. Google Scholar
Digital Library
- Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active messages: a mechanism for integrated communication and computation. In Proceedings of the 19th annual international symposium on Computer architecture, pages 256--266. ACM, 1992. Google Scholar
Digital Library
- Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johnson, M. Frans Kaashoek, and William E. Weihl. Optimistic active messages: a mechanism for scheduling communication with computation. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 217--226. ACM, 1995. Google Scholar
Digital Library
Index Terms
Efficient, portable implementation of asynchronous multi-place programs
Recommendations
Efficient, portable implementation of asynchronous multi-place programs
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programmingThe X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication.
This paper introduces an ...
A Source-to-Source Translation of Coarray Fortran with MPI for High Performance
HPC Asia 2018: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionCoarray Fortran (CAF) is a partitioned global address space (PGAS) language that is a part of standard Fortran 2008. We have implemented it as a source-to-source translator as a part of the Omni XcalebleMP compiler. Since the output is written in ...
Preliminary Implementation of Coarray Fortran Translator Based on Omni XcalableMP
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsXcalableMP (XMP) is a PGAS language for distributed memory environments. It employs Coarray Fortran (CAF) features as the local-view programming model. We implemented the main part of CAF in the form of a translator, i.e., a source-to-source compiler, ...







Comments