skip to main content
research-article

A hybrid approach of OpenMP for clusters

Published:25 February 2012Publication History
Skip Abstract Section

Abstract

We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.

References

  1. Berkeley UPC - Unified Parallel C. Available at: upc.lbl.gov.Google ScholarGoogle Scholar
  2. GCC Unified Parallel C. Available at: www.gccupc.org.Google ScholarGoogle Scholar
  3. UPC NAS Parallel Benchmarks from The George Washington University High Performance Computing Laboratory. Available at: threads.hpcl.gwu.edu/sites/npb-upc.Google ScholarGoogle Scholar
  4. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. 1991.Google ScholarGoogle Scholar
  5. M. M. Baskaran, N. Vydyanathan, U. K. R. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 219--228, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Baxter, R. Mirchandaney, and J. H. Saltz. Run-time parallelization and scheduling of loops. In Proceedings of the first annual ACM symposium on Parallel Algorithms and Architectures, SPAA '89, pages 303--312, New York, NY, USA, 1989. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented Programming, Systems, Languages, and Applications. (OOPSLA '05), pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System. In Proc. of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 186--197, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. In Symposium on Parallel and Distributed Processing, 2000.Google ScholarGoogle Scholar
  10. M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 71, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Houston, Tex., 1993.Google ScholarGoogle Scholar
  12. J. P. Hoeflinger. Extending OpenMP to Clusters. White Paper, 2006.Google ScholarGoogle Scholar
  13. K. Kusano, M. Sato, T. Hosomi, and Y. Seo. The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4. In OpenMP Shared Memory Parallel Programming, volume 2104 of Lecture Notes in Computer Science, pages 20--30. Springer Berlin / Heidelberg, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Kwon, F. Jubair, S.-J. Min, H. Bae, R. Eigenmann, and S. Midkiff. Automatic Scaling of OpenMP Beyond Shared Memory. In LCPC 2011: Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011.Google ScholarGoogle Scholar
  15. R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17 (2): 1--31, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24: 65--109, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing, ICS '02, pages 274--284, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A programming model performance study using the NAS parallel benchmarks. Scientific Programming, 18: 153--167, August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. F. V. D. Wijngaart. Efficient Implementation of a 3-Dimensional ADI Method on the iPSC/860. In In Supercomputing '93, pages 102--111, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency - Practice and Experience, 10 (11-13): 825--836, 1998.Google ScholarGoogle Scholar

Index Terms

  1. A hybrid approach of OpenMP for clusters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 8
      PPOPP '12
      August 2012
      334 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2370036
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
        February 2012
        352 pages
        ISBN:9781450311601
        DOI:10.1145/2145816

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 February 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!