skip to main content
research-article

PARRAY: a unifying array representation for heterogeneous parallelism

Published:25 February 2012Publication History
Skip Abstract Section

Abstract

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of distinct features: 1) the dimensions of an array type are nested in a tree, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of sub-programs on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.

References

  1. ReferencesGoogle ScholarGoogle Scholar
  2. CUDA CUFFT Library, Version 2.3. NVIDIA Corp., 2009.Google ScholarGoogle Scholar
  3. N. Akira and M. Satoshi. Auto-tuning 3D FFT library for cuda GPUs. In SC'09, pages 1--10. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Brown and et al. A heterogeneous parallel framework for domainspecific languages. In PACT'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Chafi and et al. A domain-specific approach to heterogeneous parallelism. In PPoPP'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the Chapel language. IJHPCA, 21(3):291--312, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Chamberlain and et al. The high-level parallel language ZPL improves productivity and performance. In IJHPCA'04, 2004.Google ScholarGoogle Scholar
  8. P. Charles and et al. X10: An object-oriented approach to nonuniform cluster computing. In OOPSLA'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Chen, X. Cui, and H. Mei. Large-scale FFT on GPU clusters. In ACM Inter. Conf. on Supercomputing (ICS'10), pages 50--59, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Fatica. Accelerating linpack with CUDA on heterogenous clusters. GPGPU'09, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Francois. Incremental migration of C and Fortran applications to GPGPU using HMPP. Technical report, hipeac, 2010.Google ScholarGoogle Scholar
  12. B. Ganesh and et al. Programming for parallelism and locality with hierarchically tiled arrays. In PPoPP'06, pages 48--57, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Govindaraju and et al. High performance discrete fourier transforms on graphics processors. SC'08, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Hains and L. M. R. Mullin. Parallel functional programming with arrays. Comput. J., 36(3):238--245, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. A. R. Hoare and et al. Laws of programming. Communications of the ACM, 30(8):672--686, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. A. R. Hoare and J. He. Unifying Theories of Programming. Prentice Hall, 1998.Google ScholarGoogle Scholar
  18. J. J. Nieplocha, R. J. Harrison, and R. J. Littlefield. Global arrays: A nonuniform memory access programming model for highperformance computers. The Journal of Supercomputing, 10(2), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. J'onsson and A. Tarski. Boolean algebras with operators, part I. American Journal of Mathematics, 73:891--939, 1951.Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Kandalla and et al. High-performance and scalable non-blocking All-to-All with collective offload on infiniband clusters: A study with parallel 3D FFT. In ISC'11, 2011.Google ScholarGoogle Scholar
  21. A. Nukada and et al. Bandwidth intensive 3-D FFT kernel for GPUs using cuda. In SC'08, pages 1--11, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Numerich and J. Reid. Co-Array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1C31, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Pekurovsky. http://www.sdsc.edu/us/resources/p3dfft.php.Google ScholarGoogle Scholar
  24. V. Volkov and B. Kazian. Fitting FFT onto the G80 architecture. http://www.cs.berkeley.edu/, May 2008.Google ScholarGoogle Scholar
  25. K. Yelick and et al. Titanium: A high-performance Java dialect. In In ACM, pages 10--11, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Zheng and et al. Extending Unified Parallel C for GPU computing. In SIAM Conf on Parallel Processing for Scientific Computing, 2010Google ScholarGoogle Scholar

Index Terms

  1. PARRAY: a unifying array representation for heterogeneous parallelism

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 47, Issue 8
          PPOPP '12
          August 2012
          334 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2370036
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
            February 2012
            352 pages
            ISBN:9781450311601
            DOI:10.1145/2145816

          Copyright © 2012 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2012

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!