skip to main content
10.1145/1133981.1133995acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Shared memory programming for large scale machines

Published:11 June 2006Publication History

ABSTRACT

This paper describes the design and implementation of a scalable run-time system and an optimizing compiler for Unified Parallel C (UPC). An experimental evaluation on BlueGene/L®, a distributed-memory machine, demonstrates that the combination of the compiler with the runtime system produces programs with performance comparable to that of efficient MPI programs and good performance scalability up to hundreds of thousands of processors.Our runtime system design solves the problem of maintaining shared object consistency efficiently in a distributed memory machine. Our compiler infrastructure simplifies the code generated for parallel loops in UPC through the elimination of affinity tests, eliminates several levels of indirection for accesses to segments of shared arrays that the compiler can prove to be local, and implements remote update operations through a lower-cost asynchronous message. The performance evaluation uses three well-known benchmarks --- HPC RandomAccess, HPC STREAM and NAS CG --- to obtain scaling and absolute performance numbers for these benchmarks on up to 131072 processors, the full BlueGene/L machine. These results were used to win the HPC Challenge Competition at SC05 in Seattle WA, demonstrating that PGAS languages support both productivity and performance.

References

  1. G. Almasi, C. Archer, J. G. Castaos, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, and B. Toonen. Design and implementation of message-passing service for the BlueGene/L supercomputer. IBM Journal of Research and Development, 49(2/3):393--406, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Almasi, L. D. Rose, B. B. Fraguela, J. Moreira, and D. A. Padua. Programming for locality and parallelism with hierarchically tiled arrays. In Workshop on Languages and Compilers for Parallel Computing (LCPC), volume 2958 of Lecture Notes in Computer Science, pages 162--176, College Station, TX, October 2003. Springer.Google ScholarGoogle Scholar
  3. C. Bell, W.-Y. Chen, D. Bonachea, and K. Yelick. Evaluating support for global address space languages on the Cray X1. In International Conference on Supercomputing (ICS), pages 184--195, New York, NY, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bonachea. GASNet specification, v1.1. Technical Report CSD-02-1207, U.C. Berkeley, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Cantonnet, T. El-Ghazawi, P. Lorenz, and J. Gaber. Fast address translation techniques for distributed shared memory compilers. In International Parallel and Distributed Processing Symposium (IPDPS), Denver, CO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, George Washington University, 1999. ftp://ftp.seas.gwu.edu/pub/upc/downloads/upctr.pdf.Google ScholarGoogle Scholar
  8. S. Chakrabarti, M. Gupta, and J.-D. Choi. Global communication analysis and optimization. In Programming Language Design and Implementation (PLDI), pages 68--78, New York, NY, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W.-Y. Chen. Building a source-to-source UPC-to-C translator. Master's thesis, University of California at Berkeley, Berkeley, CA, 2005.Google ScholarGoogle Scholar
  10. W.-Y. Chen, C. Iancu, and K. Yelick. Communication optimizations for fine-grained UPC applications. In Parallel Architectures and Compilation Techniques (PACT), pages 267--278, Washington, DC, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. El-Ghazawi, A. Mohanti, and Y. Yao. An evaluation of global address space languages: Co-array Fortran and Unified Parallel C. In Symposium on Principles and practice of parallel Programming (PPoPP), pages 36--47, New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cray UPC home page. http://docs.cray.com/books/S-2179-50/html-S-2179-50/z1035483822pvl.html.Google ScholarGoogle Scholar
  13. DARPA High Productivity Computing Systems. http://www.darpa.mil/ipto/programs/hpcs.Google ScholarGoogle Scholar
  14. T. El-Ghazawi and F. Cantonnet. UPC performance and potential: a NPB experimental study. In Proceedings of the Conference on Supercomputing, pages 1--26, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications, v1.1.1 edition, October 2003.Google ScholarGoogle Scholar
  16. A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-burow, T. Takken, and P. Vranas. Overview of the BlueGene/L system architecture. IBM Journal of Research and Development, 49(2/3):195--212, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. GCC UPC home page. http://www.intrepid.com/upc/.Google ScholarGoogle Scholar
  18. M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of the Conference on Supercomputing, page 71, New York, NY, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. HPC challenge award competition. http://www.hpcchallenge.org.Google ScholarGoogle Scholar
  20. HP/Compaq UPC. http://h30097.www3.hp.com/upc/index.htm.Google ScholarGoogle Scholar
  21. P. Husbands, C. Iancu, and K. Yelick. A performance analysis of the Berkeley UPC compiler. In International Conference on Supercomputing (ICS), pages 63--73, New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Iancu, P. Husbands, and P. Hargrove. Hunting the overlap. In Parallel Architectures and Compilation Techniques (PACT), pages 279--290, Washington, DC, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Mendell and R. Archambault. IBM's BlueGene/L compiler implementation. In BlueGene/L: Applications, Architecture and Software Workshop, Sparks, NV, Oct 2003. http://www.llnl.gov/asci/platforms/bluegene/papers/10mendell.pdf.Google ScholarGoogle Scholar
  24. J. Nieplocha, R. J. Harrison, and R. J. Littlefield. Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing, 10(2):169--189, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Numrich and J. Reid. Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum, 17(2):1--31, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Savant and S. Seidel. MuPC: A run time system for unified parallel C. Technical Report CS-TR-02-03, Department of Computer Science, Michigan Technological University, 2002.Google ScholarGoogle Scholar
  27. G. Shah, J. Nieplocha, J. Mirza, C. Kim, R. Harrison, R. K. Govindaraju, K. Gildea, P. DiNicola, and C. Bender. Performance and experience with LAPI - a new high-performance communication library for the IBM RS/6000 SP. In 12th. International Parallel Processing Symposium (IPPS), pages 260--267, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Su, A. Lain, S. Ramaswamy, D. J. Palermo, I. Eugene W. Hodges, and P. Banerjee. Advanced compilation techniques in the paradigm compiler for distributed-memory multicomputers. In International Conference on Supercomputing (ICS), pages 424--433, New York, NY, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Top500 supercomputer sites. www.top500.org.Google ScholarGoogle Scholar
  30. IBM XL UPC compiler. http://www.alphaworks.ibm.com/tech/upccompiler.Google ScholarGoogle Scholar
  31. K. Yelick. Partitioned Global Address Space Languages: Titanium and UPC experience. Presentation at IBM TJ Watson Research Center, Nov. 2005.Google ScholarGoogle Scholar
  32. K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM Workshop on Java for High-Performance Network Computing, New York, NY 10036, USA, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  33. Y. Zhu and L. J. Hendren. Communication optimizations for parallel C programs. In Programming Language Design and Implementation (PLDI), pages 199--211, New York, NY, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. Shared memory programming for large scale machines

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2006
        438 pages
        ISBN:1595933204
        DOI:10.1145/1133981
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 41, Issue 6
          Proceedings of the 2006 PLDI Conference
          June 2006
          426 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1133255
          Issue’s Table of Contents

        Copyright © 2006 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate406of2,067submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!