Abstract
Reconfigurable Computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, lack of integrated, system-wide, parallel-programming models and languages presents a significant design challenge for creating applications targeting scalable, reconfigurable HPC systems. In this article, we extend the traditional Partitioned Global Address Space (PGAS) model to provide a multilevel integration of memory, which simplifies development of parallel applications for such systems and improves developer productivity. The new multilevel-PGAS programming model captures the unique characteristics of reconfigurable HPC systems, such as the existence of multiple levels of memory hierarchy and heterogeneous computation resources. Based on this model, we extend and adapt the SHMEM communication library to become what we call SHMEM+, the first known SHMEM library enabling coordination between FPGAs and CPUs in a reconfigurable, heterogeneous HPC system. Applications designed with SHMEM+ yield improved developer productivity compared to current methods of multidevice RC design and exhibit a high degree of portability. In addition, our design of SHMEM+ library itself is portable and provides peak communication bandwidth comparable to vendor-proprietary versions of SHMEM. Application case studies are presented to illustrate the advantages of SHMEM+.
- Aggarwal, V., Garcia, R., Stitt, G., George, A., and Lam, H. 2009a. SCF: A device- and language-independent task coordination framework for reconfigurable, heterogeneous systems. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA'09). ACM, New York, 19--28. Google Scholar
Digital Library
- Aggarwal, V., George, A., Yalamanchili, K., Yoon, C., Lam, H., and Stitt, G. 2009b. Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA'09). ACM, New York, 47--54. Google Scholar
Digital Library
- APGAS. 2009. Workshop on asynchrony in the PGAS programming model. http://research.ihost.com/apgas09/.Google Scholar
- Bonachea, D. and Jeong, J. Spring 2002. GASNet: A portable high-performance communication layer for global address-space languages. CS258 Parallel Computer Architecture Project.Google Scholar
- Brigham, E. O. 1988. The Fast Fourier Transform and its Application. Prentice Hall. Google Scholar
Digital Library
- Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K. 1999. Introduction to UPC and language specification. Tech. rep., University of California-Berkeley, Berkeley, CA.Google Scholar
- Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine computation of the complex fourier series. Math. Comput. 19, 297--301.Google Scholar
Cross Ref
- Cray T3ETM Fortran Optimization Guide - 004-2518-002. 2011. SHMEM. http://docs.cray.com/books/004-2518-002/html-004-2518-002/z826920364dep.html.Google Scholar
- Darema, F. 2001. The SPMD model: Past, present and future. In Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, Berlin, Germany. Google Scholar
Digital Library
- El-Ghazawi, T., Serres, O., Bahra, S., Huang, M., and El-Araby, E. 2008. Parallel programming of high-performance reconfigurable computing systems with Unified Parallel C. In Proceedings of the Reconfigurable Systems Summer Institute.Google Scholar
- El-Ghazawi, T. A., Carlson, W. W., and Draper, J. M. 2001. UPC language specifications v1.0. http://upc.gwu.edu/docs/upc_spec_1.1.1.pdf.Google Scholar
- Farreras, M., Marjanovic, V., Ayguade, E., and Labarta, J. 1997. Gaining asynchrony by using hybrid UPC/SMPSs. In Proceedings of the Workshop on Asynchrony in the PGAS Programming Model.Google Scholar
- Gonzales, R. and Woods, R. E. 2002. Digital Image Processing. Addison-Wesley.Google Scholar
- Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. 2006. Feature Extraction, Foundations and Applications. Springer. Google Scholar
Digital Library
- Huang, J., Kumar, S., Mitra, M., Zhu, W.-J., and Zabih, R. 1997. Image indexing using color correlograms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 762--768. Google Scholar
Digital Library
- MPI. 2011. MPI standard. http://www.mcs.anl.gov/research/projects/mpi/.Google Scholar
- Network-Based Computing Laboratory. 2011. MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu.Google Scholar
- Nishtala, R., Hargrove, P. H., Bonachea, D. O., and Yelick, K. A. 2009. Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium. 1--12. Google Scholar
Digital Library
- Numrich, R. W. and Reid, J. K. 1998. Co-Array Fortran for Parallel Programming. ACM Fortran Forum 17, 2, 1--31. Google Scholar
Digital Library
- OpenMP. 2011. The OpenMP API specification for parallel programming. http://openmp.org/wp/.Google Scholar
- Saldana, M., Patel, A., Madill, C., Nunes, D., Danyao, W., Styles, H., Putnam, A., Wittig, R., and Chow, P. 2008. MPI as an abstraction for software-hardware interaction for HPRCs. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA'08). ACM, New York.Google Scholar
- SGI. 2011. Introduction to the SHMEM programming model. http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/intro_shmem.3.html&srch=intro_shmem.Google Scholar
- Shih, K., Balachandran, A., Nagarajan, K., Holland, B., Slatton, C., and George, A. 2008. Fast real-time LIDAR processing on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms.Google Scholar
- Shirazi, N., Athanas, P. M., and Abbott, A. L. 1995. Implementation of a 2-D fast Fourier transform on an FPGA-based custom computing machine. In Field Programmable Logic and Application. Springer Berlin, 282--292. Google Scholar
Digital Library
- Skarpathiotis, C. and Dimond, K. 2004. A hardware implementation of a content based image retrieval algorithm. In Field Programmable Logic and Application. Springer Berlin, 1165--1167.Google Scholar
- Storaasli, O. 2008. Accelerating senome sequencing 100-1000X with FPGAs. In Proceedings of the Many-Core and Reconfigurable Supercomputing Conference (MRSC).Google Scholar
- Underwood, K. D., Sass, R. R., and Walter B. Ligon, I. 2001. Acceleration of a 2d-fft on an adaptable computing cluster. In Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01). 180--189. Google Scholar
Digital Library
- Uzun, I., Amira, A., and Bouridane, A. 2005. FPGA implementations of fast Fourier transforms for real-time signal and image processing. IEE Proc. Vision, Image Signal Process. 152, 3, 283--296.Google Scholar
Cross Ref
- Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., and Aiken, A. 1998. Titanium: A high-performance Java dialect. In Proceedings of the ACM Workshop on Java for High-Performance Network Computing. ACM Press, New York.Google Scholar
Index Terms
SHMEM+: A multilevel-PGAS programming model for reconfigurable supercomputing
Recommendations
Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+
HPRCTA '09: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and ApplicationsReconfigurable computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption ...
Productive parallel programming with CHARM++
HPC '15: Proceedings of the Symposium on High Performance ComputingCHARM++ is a general-purpose framework for developing high-performance parallel applications [1]. Applications written using Charm++ run at scales spanning mobile devices [2], multi-core processors, multi-processor NUMA woprkstations and servers, ...
Reconfigurable Work Farms on a Massively Parallel Processor Array
FCCM '08: Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing MachinesA massively parallel processing array platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and ...






Comments