skip to main content
research-article

SHMEM+: A multilevel-PGAS programming model for reconfigurable supercomputing

Published:22 August 2011Publication History
Skip Abstract Section

Abstract

Reconfigurable Computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, lack of integrated, system-wide, parallel-programming models and languages presents a significant design challenge for creating applications targeting scalable, reconfigurable HPC systems. In this article, we extend the traditional Partitioned Global Address Space (PGAS) model to provide a multilevel integration of memory, which simplifies development of parallel applications for such systems and improves developer productivity. The new multilevel-PGAS programming model captures the unique characteristics of reconfigurable HPC systems, such as the existence of multiple levels of memory hierarchy and heterogeneous computation resources. Based on this model, we extend and adapt the SHMEM communication library to become what we call SHMEM+, the first known SHMEM library enabling coordination between FPGAs and CPUs in a reconfigurable, heterogeneous HPC system. Applications designed with SHMEM+ yield improved developer productivity compared to current methods of multidevice RC design and exhibit a high degree of portability. In addition, our design of SHMEM+ library itself is portable and provides peak communication bandwidth comparable to vendor-proprietary versions of SHMEM. Application case studies are presented to illustrate the advantages of SHMEM+.

References

  1. Aggarwal, V., Garcia, R., Stitt, G., George, A., and Lam, H. 2009a. SCF: A device- and language-independent task coordination framework for reconfigurable, heterogeneous systems. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA'09). ACM, New York, 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, V., George, A., Yalamanchili, K., Yoon, C., Lam, H., and Stitt, G. 2009b. Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA'09). ACM, New York, 47--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. APGAS. 2009. Workshop on asynchrony in the PGAS programming model. http://research.ihost.com/apgas09/.Google ScholarGoogle Scholar
  4. Bonachea, D. and Jeong, J. Spring 2002. GASNet: A portable high-performance communication layer for global address-space languages. CS258 Parallel Computer Architecture Project.Google ScholarGoogle Scholar
  5. Brigham, E. O. 1988. The Fast Fourier Transform and its Application. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K. 1999. Introduction to UPC and language specification. Tech. rep., University of California-Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  7. Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine computation of the complex fourier series. Math. Comput. 19, 297--301.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cray T3ETM Fortran Optimization Guide - 004-2518-002. 2011. SHMEM. http://docs.cray.com/books/004-2518-002/html-004-2518-002/z826920364dep.html.Google ScholarGoogle Scholar
  9. Darema, F. 2001. The SPMD model: Past, present and future. In Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. El-Ghazawi, T., Serres, O., Bahra, S., Huang, M., and El-Araby, E. 2008. Parallel programming of high-performance reconfigurable computing systems with Unified Parallel C. In Proceedings of the Reconfigurable Systems Summer Institute.Google ScholarGoogle Scholar
  11. El-Ghazawi, T. A., Carlson, W. W., and Draper, J. M. 2001. UPC language specifications v1.0. http://upc.gwu.edu/docs/upc_spec_1.1.1.pdf.Google ScholarGoogle Scholar
  12. Farreras, M., Marjanovic, V., Ayguade, E., and Labarta, J. 1997. Gaining asynchrony by using hybrid UPC/SMPSs. In Proceedings of the Workshop on Asynchrony in the PGAS Programming Model.Google ScholarGoogle Scholar
  13. Gonzales, R. and Woods, R. E. 2002. Digital Image Processing. Addison-Wesley.Google ScholarGoogle Scholar
  14. Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. 2006. Feature Extraction, Foundations and Applications. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huang, J., Kumar, S., Mitra, M., Zhu, W.-J., and Zabih, R. 1997. Image indexing using color correlograms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 762--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MPI. 2011. MPI standard. http://www.mcs.anl.gov/research/projects/mpi/.Google ScholarGoogle Scholar
  17. Network-Based Computing Laboratory. 2011. MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu.Google ScholarGoogle Scholar
  18. Nishtala, R., Hargrove, P. H., Bonachea, D. O., and Yelick, K. A. 2009. Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Numrich, R. W. and Reid, J. K. 1998. Co-Array Fortran for Parallel Programming. ACM Fortran Forum 17, 2, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. OpenMP. 2011. The OpenMP API specification for parallel programming. http://openmp.org/wp/.Google ScholarGoogle Scholar
  21. Saldana, M., Patel, A., Madill, C., Nunes, D., Danyao, W., Styles, H., Putnam, A., Wittig, R., and Chow, P. 2008. MPI as an abstraction for software-hardware interaction for HPRCs. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA'08). ACM, New York.Google ScholarGoogle Scholar
  22. SGI. 2011. Introduction to the SHMEM programming model. http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/intro_shmem.3.html&srch=intro_shmem.Google ScholarGoogle Scholar
  23. Shih, K., Balachandran, A., Nagarajan, K., Holland, B., Slatton, C., and George, A. 2008. Fast real-time LIDAR processing on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms.Google ScholarGoogle Scholar
  24. Shirazi, N., Athanas, P. M., and Abbott, A. L. 1995. Implementation of a 2-D fast Fourier transform on an FPGA-based custom computing machine. In Field Programmable Logic and Application. Springer Berlin, 282--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Skarpathiotis, C. and Dimond, K. 2004. A hardware implementation of a content based image retrieval algorithm. In Field Programmable Logic and Application. Springer Berlin, 1165--1167.Google ScholarGoogle Scholar
  26. Storaasli, O. 2008. Accelerating senome sequencing 100-1000X with FPGAs. In Proceedings of the Many-Core and Reconfigurable Supercomputing Conference (MRSC).Google ScholarGoogle Scholar
  27. Underwood, K. D., Sass, R. R., and Walter B. Ligon, I. 2001. Acceleration of a 2d-fft on an adaptable computing cluster. In Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01). 180--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Uzun, I., Amira, A., and Bouridane, A. 2005. FPGA implementations of fast Fourier transforms for real-time signal and image processing. IEE Proc. Vision, Image Signal Process. 152, 3, 283--296.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., and Aiken, A. 1998. Titanium: A high-performance Java dialect. In Proceedings of the ACM Workshop on Java for High-Performance Network Computing. ACM Press, New York.Google ScholarGoogle Scholar

Index Terms

  1. SHMEM+: A multilevel-PGAS programming model for reconfigurable supercomputing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!