Abstract
Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and manycore processors coupled with standard microprocessors are becoming an increasingly popular solution for future computing systems due to their higher performance and energy efficiency. Although programming languages and tools are evolving to simplify device-level design, programming such systems is still difficult and time-consuming largely due to system-wide challenges involving communication between heterogeneous devices, which currently require ad hoc solutions. Most communication frameworks and APIs which have dominated parallel application development for decades were developed for homogeneous systems, and hence cannot be directly employed for hybrid systems. To solve this problem, this article presents the System Coordination Framework (SCF), which employs message passing to transparently enable communication between tasks described using different programming tools (and languages), and running on heterogeneous processing devices of systems from domains ranging from embedded systems to High-Performance Computing (HPC) systems. By hiding low-level architectural details of the underlying communication from an application designer, SCF can improve application development productivity, provide higher levels of application portability, and offer rapid design-space exploration of different task/device mappings. In addition, SCF enables custom communication synthesis that exploits mechanisms specific to different devices and platforms, which can provide performance improvements over generic solutions employed previously. Our results indicate a performance improvement of 28× and 682× by employing FPGA devices for two applications presented in this article, while simultaneously improving the developer productivity by approximately 2.5 to 5 times by using SCF.
- Aggarwal, V., Garcia, R., Stitt, G., George, A., and Lam, H. 2009a. SCF: A device- and language-independent task coordination framework for reconfigurable, heterogeneous systems. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’09). ACM, New York, 19--28. Google Scholar
Digital Library
- Aggarwal, V., George, A., Yalamanchili, K., Yoon, C., Lam, H., and Stitt, G. 2009b. Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’09). ACM, New York, 47--54. Google Scholar
Digital Library
- Aggarwal, V., George, A., Yoon, C., Yalamanchili, K., and Lam, H. 2012. SHMEM+: A multilevel-pgas programming model for reconfigurable supercomputing. ACM Trans. Reconfig. Technol. Syst. (to appear). Google Scholar
Digital Library
- Bhat, P., Lim, Y., and Prasanna, V. 1995. Issues in using heterogeneous HPC systems for embedded real time signal processing applications. In Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications. 134--141. Google Scholar
Digital Library
- Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K. 1999. Introduction to UPC and language specification. Tech. rep., University of California-Berkeley, Berkeley, CA.Google Scholar
- Chamberlain, R. D., Franklin, M. A., Tyson, E. J., Buckley, J. H., Buhler, J., Galloway, G., Gayen, S., Hall, M., Shands, E. B., and Singla, N. 2010. Auto-Pipe: Streaming applications on architecturally diverse systems. Comput. 43, 42--49. Google Scholar
Digital Library
- Culler, D., Singh, J., and Gupta, A. 1998. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, Chapter 2.3. Google Scholar
Digital Library
- Eclipse. 2011. Eclipse classic 3.4.1. http://www.eclipse.org/downloads/packages/eclipse-classic-341/ganymedesr1.Google Scholar
- El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. Comput. 41, 69--76. Google Scholar
Digital Library
- El-Ghazawi, T. A., Carlson, W. W., and Draper, J. M. 2001. UPC language specifications v1.0. http://upc.gwu.edu/docs/upc_spec_1.1.1.pdf.Google Scholar
- Erbas, C. and Pimentel, A. D. 2003. Utilizing synthesis methods in accurate system-level exploration of heterogeneous embedded systems. In Proceedings of the IEEE Workshop on Signal Processing Systems (SIPS). 310--315.Google Scholar
- Farreras, M., Marjanovic, V., Ayguade, E., and Labarta, J. 2009. Gaining asynchrony by using hybrid UPC/SMPSs. In Proceedings of the Workshop on Asynchrony in the PGAS Programming Model.Google Scholar
- Franklin, M., Tyson, E., Buckley, J., Crowley, P., and Maschmeyer, J. 2006. Auto-Pipe and the X language: A pipeline design tool and description language. In Proceedings of the 20th International Parallel and Distributed Processing Symposium. Google Scholar
Digital Library
- Graham, R., Shipman, G., Barrett, B., Castain, R., Bosilca, G., and Lumsdaine, A. 2006. Open mpi: A high-performance, heterogeneous mpi. In Proceedings of the IEEE International Conference on Cluster Computing. 1--9.Google Scholar
- Group, K. 2011. OpenCL 1.0 specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.43.pdf.Google Scholar
- Lastovetsky, A. and Reddy, R. 2006. Heterompi: Towards a message-passing library for heterogeneous networks of computers. J. Parallel Distrib. Comput. 66, 2, 197--220. Google Scholar
Digital Library
- Lee, C. and Salcic, Z. 1997. A fully-hardware-type maximum-parallel architecture for kalman tracking filter in fpgas. In Proceedings of the Conference on Information, Communications and Signal Processing (ICICS). 1243--1247.Google Scholar
- Lig, H. T., Hylands, C., Lee, E., Liu, J., Liu, X., Neuendorffer, S., Xiong, Y., Zhao, Y., and Zheng, H. 2003. Overview of the Ptolemy project.Google Scholar
- Luk, W., Coutinho, J., Todman, T., Lam, Y., Osborne, W., Susanto, K., Liu, Q., and Wong, W. 2009. A high-level compilation toolchain for heterogeneous systems. In Proceedings of the IEEE International SOC Conference. 9--18.Google Scholar
- Massetto, F. I., Junior, A. M. G., and Sato, L. M. 2006. HyMPI - a MPI implementation for heterogeneous high performance systems. In Proceedings of the International Conference on Grid and Pervasive Computing (GPC’06). 314--323. Google Scholar
Digital Library
- MPI. 2011. MPI standard. http://www.mcs.anl.gov/research/projects/mpi/.Google Scholar
- Olukotun, K. and Hammond, L. 2005. The future of microprocessors. Queue 3, 7, 26--29. Google Scholar
Digital Library
- OpenArchitectureWare. 2011. Xtext reference documentation. http://www.openarchitectureware.org/pub/documentation/4.1//r80_xtextReference.pdf.Google Scholar
- OpenFPGA. 2011. OpenFPGA GenAPI version 0.4 draft for comment. http://www.openfpga.org/Standards%20Documents/OpenFPGA-GenAPIv0.4.pdf.Google Scholar
- OpenMP. 2011. The OpenMP API specification for parallel programming. http://openmp.org/wp/.Google Scholar
- Pascoe, C., Lawande, A., Lam, H., George, A., Sun, Y., and Farmerie, W. 2010. Reconfigurable supercomputing with scalable systolic arrays and in-stream control for wavefront genomics processing. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing (SAAHPC).Google Scholar
- Pellerin, D. and Thibault, S. 2005. Practical FPGA Programming in C 1st Ed. Prentice Hall Press, Upper Saddle River, NJ. Google Scholar
Digital Library
- Reardon, C., Holland, B., George, A., Stitt, G., and Lam, H. 2012. RCML: An environment for estimation modeling of reconfigurable computing systems. ACM Trans. Embed. Comput. Syst. (to appear). Google Scholar
Digital Library
- Saldana, M., Patel, A., Madill, C., Nunes, D., Danyao, W., Styles, H., Putnam, A., Wittig, R., and Chow, P. 2008. MPI as an abstraction for software-hardware interaction for HPRCs. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’08). ACM, New York.Google Scholar
- Sanders, J. and Kandrot, E. 2010. CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st Ed. Addison-Wesley Professional. Google Scholar
Digital Library
- SGI. 2011. Introduction to the SHMEM programming model.Google Scholar
- Shih, K., Balachandran, A., Nagarajan, K., Holland, B., Slatton, C., and George, A. 2008. Fast real-time LIDAR processing on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms.Google Scholar
- Storaasli, O. 2008. Accelerating genome sequencing 100-1000X with FPGAs. In Proceedings of the Many-Core and Reconfigurable Supercomputing Conference (MRSC).Google Scholar
- Subramanian, N. 2009. A C-to-FPGA solution for accelerating tomographic reconstruction. M.S. thesis, University of Washington.Google Scholar
- Sunderam, V. S. 1990. Pvm: A framework for parallel distributed computing. Concur. Pract. Exper. 2, 315--339. Google Scholar
Digital Library
- Tilera Corp. 2008. TILE64 processor product brief. Tilera Corp.Google Scholar
- Tsui, B. M. W. and Frey, E. C. 2006. Analytic image reconstruction methods in emission computed tomography. In Quantitative Analysis in Nuclear Medicine Imaging, Springer, 82--106.Google Scholar
- Williams, J., Massie, C., George, A. D., Richardson, J., Gosrani, K., and Lam, H. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Trans. Reconfig. Technol. Syst. 3, 19:1--19:29. Google Scholar
Digital Library
Index Terms
SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems
Recommendations
Low-Overhead FPGA Middleware for Application Portability and Productivity
Reconfigurable computing devices such as field-programmable gate arrays (FPGAs) offer advantages over fixed-logic CPU and GPU architectures, including improved performance, superior power efficiency, and reconfigurability. The challenge of FPGA ...
SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems
HPRCTA '09: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and ApplicationsHeterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and Cell processors coupled with standard microprocessors are becoming an increasingly popular solution to building future computing systems. Although programming languages ...
Reconfigurable computing middleware for application portability and productivity
ASAP '13: Proceedings of the 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)Reconfigurable computing (RC) devices such as field-programmable gate arrays (FPGAs) offer significant advantages over fixed-logic, many-core CPU and GPU architectures, including increased performance for many computationally challenging applications, ...






Comments