skip to main content
research-article

Efficient, portable implementation of asynchronous multi-place programs

Published:14 February 2009Publication History
Skip Abstract Section

Abstract

The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication.

This paper introduces an expressive subset of X10, Flat X10, designed to permit efficient execution across multiple single-threaded places with a simple runtime and without compromising on the productivity of X10. We present the design, implementation and evaluation of a compiler and runtime system for Flat X10. The Flat X10 compiler translates programs into C++ SPMD programs communicating using an active messaging infrastructure. It uses novel techniques to transform explicitly parallel programs into SPMD programs. The runtime system is based on IBM's LAPI (Low-level API) and is easily portable to other libraries such as GASNet and ARMCI.

Our implementation realizes performance comparable to hand-written MPI programs for well-known HPC benchmarks such as Random Access, Stream, and FFT, on a Federation-based cluster of Power5 SMPs (with hundreds of processors) and the Blue Gene (with thousands of processors). Submissions based on the work presented in this paper were co-winners of the 2007 and 2008 HPC Challenge Type II Awards.

References

  1. Saman P. Amarasinghe and Monica S. Lam. Communication Optimization and Code Generation for Distributed Memory Machines. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, pages 126--138. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christopher Barton, CĆlin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, and José Nelson Amaral. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 108--117. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Stephen M. Blackburn, Richard L. Hudson, Ron Morrison, David S. Munro, and John Zigman. Starting with termination: A methodology for building distributed garbage collection algorithms. Aust. Comput. Sci. Commun, 23:2001, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. UPC Consortium. UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ron Cytron, Jim Lipkis, and Edith Schonberg. A Compiler-Assisted Approach to SPMD Execution. In Proceedings of the ACM/IEEE conference on Supercomputing, pages 398--406. IEEE Computer Society, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Darema-Rogers, D. A. George, V.A. Norton, and G.F. Pfister. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN. Parallel Computing, 7:11--24, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  8. F. Darema-Rogers, V. A. Norton, and G. F. Pfister. Using A Single-Program-Multiple-Data Computational Model for Parallel Execution of Scientific Applications. Technical Report RC 11552, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1985.Google ScholarGoogle Scholar
  9. Jeffrey Dean, David Grove, and Craig Chambers. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In Proceedings of the European Conference on Object-Oriented Programming, pages 77--101. Springer-Verlag, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Saraswat et al. HPC challenge 07: X10, 2007.Google ScholarGoogle Scholar
  11. R. Garg and Y. Sabharwal. MPI and Communication -- Software Routing and Aggregation of Messages to Optimize the Performance of HPCC RandomAccess Benchmark. In SuperComputing, Nov 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Paul N. Hilfinger, Dan Bonachea, David Gay, Susan Graham, Ben Liblit, Geoff Pike, and Katherine Yelick. Titanium Language Reference Manual. Technical report, University of California at Berkeley, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. IBM International Technical Support Organization Poughkeepsie Center. Overview of LAPI. www.redbooks.ibm.com/redbooks/pdfs/sg242080.pdf, 2008.Google ScholarGoogle Scholar
  14. Eric Mohr, David A. Kranz, and Jr. Robert H. Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, pages 185--197. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Numrich and J. Reid. Co-array fortran for parallel programming, 1998.Google ScholarGoogle Scholar
  16. E. M. Paalvast, L. C. Breebart, and H. J. Sips. An expressive annotation model for generating SPMD programs. In Scalable High Performance Computing Conference, pages 208--211. IEEE Computer Society, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  17. Edwin M. Paalvast, Arjan J. van Gemund, and Henk J. Sips. A method for parallel program generation with an application to the Booster language. SIGARCH Comput. Archit. News, 18(3b):457--469, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vijay A. Saraswat. X10 Language Report. Technical report, IBM Research, 2004.Google ScholarGoogle Scholar
  19. Chau-Wen Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 144--155. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active messages: a mechanism for integrated communication and computation. In Proceedings of the 19th annual international symposium on Computer architecture, pages 256--266. ACM, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johnson, M. Frans Kaashoek, and William E. Weihl. Optimistic active messages: a mechanism for scheduling communication with computation. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 217--226. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient, portable implementation of asynchronous multi-place programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 4
      PPoPP '09
      April 2009
      294 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1594835
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
        February 2009
        322 pages
        ISBN:9781605583976
        DOI:10.1145/1504176

      Copyright © 2009 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 February 2009

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!