skip to main content
research-article

SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems

Published:01 June 2012Publication History
Skip Abstract Section

Abstract

Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and manycore processors coupled with standard microprocessors are becoming an increasingly popular solution for future computing systems due to their higher performance and energy efficiency. Although programming languages and tools are evolving to simplify device-level design, programming such systems is still difficult and time-consuming largely due to system-wide challenges involving communication between heterogeneous devices, which currently require ad hoc solutions. Most communication frameworks and APIs which have dominated parallel application development for decades were developed for homogeneous systems, and hence cannot be directly employed for hybrid systems. To solve this problem, this article presents the System Coordination Framework (SCF), which employs message passing to transparently enable communication between tasks described using different programming tools (and languages), and running on heterogeneous processing devices of systems from domains ranging from embedded systems to High-Performance Computing (HPC) systems. By hiding low-level architectural details of the underlying communication from an application designer, SCF can improve application development productivity, provide higher levels of application portability, and offer rapid design-space exploration of different task/device mappings. In addition, SCF enables custom communication synthesis that exploits mechanisms specific to different devices and platforms, which can provide performance improvements over generic solutions employed previously. Our results indicate a performance improvement of 28× and 682× by employing FPGA devices for two applications presented in this article, while simultaneously improving the developer productivity by approximately 2.5 to 5 times by using SCF.

References

  1. Aggarwal, V., Garcia, R., Stitt, G., George, A., and Lam, H. 2009a. SCF: A device- and language-independent task coordination framework for reconfigurable, heterogeneous systems. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’09). ACM, New York, 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, V., George, A., Yalamanchili, K., Yoon, C., Lam, H., and Stitt, G. 2009b. Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’09). ACM, New York, 47--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aggarwal, V., George, A., Yoon, C., Yalamanchili, K., and Lam, H. 2012. SHMEM+: A multilevel-pgas programming model for reconfigurable supercomputing. ACM Trans. Reconfig. Technol. Syst. (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bhat, P., Lim, Y., and Prasanna, V. 1995. Issues in using heterogeneous HPC systems for embedded real time signal processing applications. In Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications. 134--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K. 1999. Introduction to UPC and language specification. Tech. rep., University of California-Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  6. Chamberlain, R. D., Franklin, M. A., Tyson, E. J., Buckley, J. H., Buhler, J., Galloway, G., Gayen, S., Hall, M., Shands, E. B., and Singla, N. 2010. Auto-Pipe: Streaming applications on architecturally diverse systems. Comput. 43, 42--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Culler, D., Singh, J., and Gupta, A. 1998. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, Chapter 2.3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eclipse. 2011. Eclipse classic 3.4.1. http://www.eclipse.org/downloads/packages/eclipse-classic-341/ganymedesr1.Google ScholarGoogle Scholar
  9. El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. Comput. 41, 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. El-Ghazawi, T. A., Carlson, W. W., and Draper, J. M. 2001. UPC language specifications v1.0. http://upc.gwu.edu/docs/upc_spec_1.1.1.pdf.Google ScholarGoogle Scholar
  11. Erbas, C. and Pimentel, A. D. 2003. Utilizing synthesis methods in accurate system-level exploration of heterogeneous embedded systems. In Proceedings of the IEEE Workshop on Signal Processing Systems (SIPS). 310--315.Google ScholarGoogle Scholar
  12. Farreras, M., Marjanovic, V., Ayguade, E., and Labarta, J. 2009. Gaining asynchrony by using hybrid UPC/SMPSs. In Proceedings of the Workshop on Asynchrony in the PGAS Programming Model.Google ScholarGoogle Scholar
  13. Franklin, M., Tyson, E., Buckley, J., Crowley, P., and Maschmeyer, J. 2006. Auto-Pipe and the X language: A pipeline design tool and description language. In Proceedings of the 20th International Parallel and Distributed Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Graham, R., Shipman, G., Barrett, B., Castain, R., Bosilca, G., and Lumsdaine, A. 2006. Open mpi: A high-performance, heterogeneous mpi. In Proceedings of the IEEE International Conference on Cluster Computing. 1--9.Google ScholarGoogle Scholar
  15. Group, K. 2011. OpenCL 1.0 specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.43.pdf.Google ScholarGoogle Scholar
  16. Lastovetsky, A. and Reddy, R. 2006. Heterompi: Towards a message-passing library for heterogeneous networks of computers. J. Parallel Distrib. Comput. 66, 2, 197--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lee, C. and Salcic, Z. 1997. A fully-hardware-type maximum-parallel architecture for kalman tracking filter in fpgas. In Proceedings of the Conference on Information, Communications and Signal Processing (ICICS). 1243--1247.Google ScholarGoogle Scholar
  18. Lig, H. T., Hylands, C., Lee, E., Liu, J., Liu, X., Neuendorffer, S., Xiong, Y., Zhao, Y., and Zheng, H. 2003. Overview of the Ptolemy project.Google ScholarGoogle Scholar
  19. Luk, W., Coutinho, J., Todman, T., Lam, Y., Osborne, W., Susanto, K., Liu, Q., and Wong, W. 2009. A high-level compilation toolchain for heterogeneous systems. In Proceedings of the IEEE International SOC Conference. 9--18.Google ScholarGoogle Scholar
  20. Massetto, F. I., Junior, A. M. G., and Sato, L. M. 2006. HyMPI - a MPI implementation for heterogeneous high performance systems. In Proceedings of the International Conference on Grid and Pervasive Computing (GPC’06). 314--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. MPI. 2011. MPI standard. http://www.mcs.anl.gov/research/projects/mpi/.Google ScholarGoogle Scholar
  22. Olukotun, K. and Hammond, L. 2005. The future of microprocessors. Queue 3, 7, 26--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. OpenArchitectureWare. 2011. Xtext reference documentation. http://www.openarchitectureware.org/pub/documentation/4.1//r80_xtextReference.pdf.Google ScholarGoogle Scholar
  24. OpenFPGA. 2011. OpenFPGA GenAPI version 0.4 draft for comment. http://www.openfpga.org/Standards%20Documents/OpenFPGA-GenAPIv0.4.pdf.Google ScholarGoogle Scholar
  25. OpenMP. 2011. The OpenMP API specification for parallel programming. http://openmp.org/wp/.Google ScholarGoogle Scholar
  26. Pascoe, C., Lawande, A., Lam, H., George, A., Sun, Y., and Farmerie, W. 2010. Reconfigurable supercomputing with scalable systolic arrays and in-stream control for wavefront genomics processing. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing (SAAHPC).Google ScholarGoogle Scholar
  27. Pellerin, D. and Thibault, S. 2005. Practical FPGA Programming in C 1st Ed. Prentice Hall Press, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Reardon, C., Holland, B., George, A., Stitt, G., and Lam, H. 2012. RCML: An environment for estimation modeling of reconfigurable computing systems. ACM Trans. Embed. Comput. Syst. (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Saldana, M., Patel, A., Madill, C., Nunes, D., Danyao, W., Styles, H., Putnam, A., Wittig, R., and Chow, P. 2008. MPI as an abstraction for software-hardware interaction for HPRCs. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’08). ACM, New York.Google ScholarGoogle Scholar
  30. Sanders, J. and Kandrot, E. 2010. CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st Ed. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SGI. 2011. Introduction to the SHMEM programming model.Google ScholarGoogle Scholar
  32. Shih, K., Balachandran, A., Nagarajan, K., Holland, B., Slatton, C., and George, A. 2008. Fast real-time LIDAR processing on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms.Google ScholarGoogle Scholar
  33. Storaasli, O. 2008. Accelerating genome sequencing 100-1000X with FPGAs. In Proceedings of the Many-Core and Reconfigurable Supercomputing Conference (MRSC).Google ScholarGoogle Scholar
  34. Subramanian, N. 2009. A C-to-FPGA solution for accelerating tomographic reconstruction. M.S. thesis, University of Washington.Google ScholarGoogle Scholar
  35. Sunderam, V. S. 1990. Pvm: A framework for parallel distributed computing. Concur. Pract. Exper. 2, 315--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tilera Corp. 2008. TILE64 processor product brief. Tilera Corp.Google ScholarGoogle Scholar
  37. Tsui, B. M. W. and Frey, E. C. 2006. Analytic image reconstruction methods in emission computed tomography. In Quantitative Analysis in Nuclear Medicine Imaging, Springer, 82--106.Google ScholarGoogle Scholar
  38. Williams, J., Massie, C., George, A. D., Richardson, J., Gosrani, K., and Lam, H. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Trans. Reconfig. Technol. Syst. 3, 19:1--19:29. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!