Abstract
The field of high performance computing (HPC) currently abounds with excitement about the potential of a broad class of things called accelerators. And, yet, few accelerator based systems are being deployed in general purpose HPC environments. Why is that? This article explores the challenges that accelerators face in the HPC world, with a specific focus on FPGA based systems. We begin with an overview of the characteristics and challenges of typical HPC systems and applications and discuss why FPGAs have the potential to have a significant impact. The bulk of the article is focused on twelve specific areas where FPGA researchers can make contributions to hasten the adoption of FPGAs in HPC environments.
- Adiga, N., Almasi, G., and Aridor, Y., et al. 2002. An overview of the BlueGene/L supercomputer. In Proceedings of the SC Conference on High Performance Networking and Computing. Google Scholar
Digital Library
- Alfke, P. 2008. Virtex-5 FXT: A new FPGA platform. Hot Chips 20.Google Scholar
- Balay, S., Gropp, W. D., McInnes, L. C., and Smith, B. F. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen, Eds. Birkhäuser Press, 163--202. Google Scholar
Digital Library
- Ben-Yehuda, M., Xenidis, J., Ostrowski, M., Rister, K., Bruemmer, A., and van Doorn, L. 2007. The price of safety: Evaluating iommu performance. In Proceedings of the Ottawa Linux Symposium (OLS’07). 9--20.Google Scholar
- Chou, Y., Pillai, P., Schmit, H., and Shen, J. P. 2000. Piperench implementation of the instruction path coprocessor. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’33). ACM, New York, 147--158. Google Scholar
Digital Library
- ClearSpeed Technology 2008. CSX700 Processor Product Brief.Google Scholar
- Crawford, C. H., Henning, P., Kistler, M., and Wright, C. 2008. Accelerating computing with the Cell broadband engine processor. In Proceedings of the Conference on Computing Frontiers (CF). ACM, New York, 3--12. Google Scholar
Digital Library
- Cray Canada, Inc. 2005a. Cray XD1 technical specifications - Release 1.4. http://www.clearspeed.com/products/document/CSX700_Product_Brief.pdf.Google Scholar
- Cray Canada, Inc. 2005b. Mersenne Twister Application: Cray XD1 FPGA Programming Manual release 1.2.1. http://www.cmf.nrl.navy.mil/CCS/help/pdfs/SHMEM/131_CrayXD1FPGADevelopment.pdf. Cray Canada, Inc.Google Scholar
- Cray Research, Inc. 1994. SHMEM Technical Note for C, SG-2516 2.3.Google Scholar
- Davidson, G., Cowie, J., Helmreich, S., Zacharski, R., and Boyack, K. 2006. Data-centric computing with the Netezza architecture. Tech. rep., SAND2006-1853, Sandia National Laboratories.Google Scholar
- deLorimier, M. and DeHon, A. 2005. Floating point sparse matrix-vector multiply for FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. Google Scholar
Digital Library
- El-Araby, E., Nosum, P., and El-Ghazawi, T. 2007. Productivity of high-level languages on reconfigurable computers: An HPC perspective. In Proceedings of the International Conference on Field-Programmable Technology (ICFPT’07). 257--260.Google Scholar
- El-Ghazawi, T. A., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V. V., and Buell, D. A. 2008. The promise of high-performance reconfigurable computing. IEEE Comput. 41, 2, 69--76. Google Scholar
Digital Library
- Exegy 2008. http://www.exegy.com.Google Scholar
- Fahey, M. R., Alam, S., Dunigan, T. H., Vetter, J. S., and Worley, P. H. 2005. Early evaluation of the Cray XD1. In Proceedings of the Cray User Group Annual Technical Conference.Google Scholar
- Frigo, M. and Johnson, S. G. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Vol. 3. 1381--1384.Google Scholar
- Fu, W. and Compton, K. 2005. An execution environment for reconfigurable computing. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’05). 149--158. Google Scholar
Digital Library
- Gokhale, M., Cohen, J., Yoo, A., Miller, W. M., Jacob, A., Ulmer, C., and Pearce, R. 2008. Hardware technologies for high-performance data-intensive computing. IEEE Comput. 41, 60--68. Google Scholar
Digital Library
- Gokhale, M. and Graham, P. S. 2005. Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays. Springer. Google Scholar
Digital Library
- Graham, P., Nelson, B., and Hutchings, B. 2001. Instrumenting bitstreams for debugging FPGA circuits. In Proceedings of the 9th Annual Symposium on Field-Programmable Custom Computing Machines (FCCM’01). 41--50. Google Scholar
Digital Library
- Greaves, D. and Singh, S. 2008. Kiwi: Synthesis of FPGA circuits from parallel programs. In Proceedings of the 16th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’08). Google Scholar
Digital Library
- Gschwind, M., Hofstee, H. P., Flachs, B., Hopkins, M., Watanabe, Y., and Yamazaki, T. 2006. Synergistic processing in Cell’s multicore architecture. IEEE Micro 26, 2, 10--24. Google Scholar
Digital Library
- Hemmert, K., Tripp, J. L., Hutchings, B. L., and Jackson, P. A. 2003. Source level debugger for the sea cucumber synthesizing compiler. In Proceedings of the 11th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’03). 228--237. Google Scholar
Digital Library
- Hempel, R. and Walker, D. W. 1999. The emergence of the MPI message passing standard for parallel computing. Comput. Stand. Interfaces 21, 1, 51--62. Google Scholar
Digital Library
- Heroux, M. A., Bartlett, R. A., Howle, V. E., Hoekstra, R. J., Hu, J. J., Kolda, T. G., Lehoucq, R. B., Long, K. R., Pawlowski, R. P., Phipps, E. T., Salinger, A. G., Thornquist, H. K., Tuminaro, R. S., Willenbring, J. M., Williams, A., and Stanley, K. S. 2005. An overview of the Trilinos Project. ACM Trans. Math. Softw. 31, 3, 397--423. Google Scholar
Digital Library
- Kelly, S. M. and Brightwell, R. 2005. Software architecture of the light weight kernel, Catamount. In Proceedings of the Cray User Group Annual Technical Conference.Google Scholar
- Koehler, S., Curreri, J., and George, A. D. 2008. Performance analysis challenges and framework for high-performance reconfigurable computing. Para. Comput. 34, 4-5, 217--230. Google Scholar
Digital Library
- Lawson, C., Hanson, R., Kincaid, D., and Krough, F. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw. 5, 3, 308--323. Google Scholar
Digital Library
- Mansur, D. 2008. Stratix IV FPGA and HardCopy IV ASIC @ 40 nm. Hot Chips 20.Google Scholar
- Message Passing Interface Forum 1997. MPI-2: Extensions to the Message-Passing Interface. Message Passing Interface Forum. http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.Google Scholar
- Palaniswamy, N. 2008. Intel quickassist. http://www.intel.com/go/quickassist.Google Scholar
- Patel, A., Madill, C. A., Saldana, M., Comis, C., Pomes, R., and Chow, P. 2006. A scalable FPGA-based multiprocessor. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 111--120. Google Scholar
Digital Library
- Plimpton, S. J., Pollock, R., and Stevens, M. 1997. Particle-mesh ewald and rRESPA for parallel molecular dynamics. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing.Google Scholar
- Poznanovic, D. 2005. Application development on the SRC computers, Inc. systems. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 78a--78a. Google Scholar
Digital Library
- Pozzi, L., Ienne, P., Dubach, C., and Vuletic, M. 2005. Enabling unrestricted automated synthesis of portable hardware accelerators for virtual machines. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 243--248. Google Scholar
Digital Library
- Quinn, H. and Graham, P. 2005. Terrestrial-based radiation: A cautionary tale. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- Quinn, H., Morgan, K., Graham, P., Krone, J., Caffrey, M., and Lundgreen, K. 2007. Domain crossing errors: Limitations on single device triple-modular redundancy circuits in Xilinx FPGAs. IEEE Trans. Nucl. Sci. 54, 6, 2037--2043.Google Scholar
Cross Ref
- Rodrigues, A., Murphy, R., Kogge, P., and Underwood, K. 2004. Characterizing a new class of threads in scientific applications for high end supercomputers. In Proceedings of the International Conference on Supercomputing (ICS’04). Google Scholar
Digital Library
- Rupnow, K., Rodrigues, A., Underwood, K., and Compton, K. 2006. Scientific applications vs. SPEC-FP: A comparison of program behavior. In Proceedings of the International Conference on Supercomputing (ICS’06). Google Scholar
Digital Library
- Scrofano, R., Gokhale, M., Trouw, F., and Prasanna, V. K. 2006. A hardware/software approach to molecular dynamics on reconfigurable computers. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- SPEC 2004. http://www.spec.org.Google Scholar
- SRC Computers, Inc. 2007. Introduction to the SRC-7 MAPstation. SPC Computers Inc. Colorado Springs, CO.Google Scholar
- Stahlberg, E. 2008. OpenFPGA website. http://www.openfpga.org/.Google Scholar
- Sunderam, V. S. 1990. PVM: A framework for parallel distributed computing. Concurr. Pract. Exper. 2, 4, 315--339. Google Scholar
Digital Library
- TimeLogic 2008. http://www.timelogic.com/codequest.html.Google Scholar
- Underwood, K. D. and Hemmert, K. S. 2004. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- Underwood, K. D., Hemmert, K. S., and Ulmer, C. 2006. Architectures and APIs: Assessing requirements for delivering FPGA performance to applications. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’06). ACM, New York, 1--10. Google Scholar
Digital Library
- Underwood, K. D., Levenhagen, M. J., and Brightwell, R. 2007. Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). ACM, New York, 1--10. Google Scholar
Digital Library
- Underwood, K. D., Sass, R. R., and Ligon, W. B. 2001. A reconfigurable extension to the network interface of Beowulf clusters. In Proceedings of the Conference on Cluster Computing. 212--221. Google Scholar
Digital Library
- Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Para. Comput. 27, 1--2, 3--35.Google Scholar
- Woods, N. 2008. FPGA acceleration of European options pricing.Google Scholar
- XtremeData 2008. http://www.xtremedatainc.com.Google Scholar
- Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. Google Scholar
Digital Library
Index Terms
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
Recommendations
High Productivity Computing System Based on FPGA and Its Application on Plasma Simulation
HPCC '08: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and CommunicationsFor computational intensive applications, effective and high performance computing capacity is of the most important. Besides the large scale parallel computers or supercomputers, small calculating arrays based on DSPs or FPGAs have become a reasonable ...
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs
Floating-point applications are a growing trend in the FPGA community. As such, it has become critical to create floating-point units optimized for standard FPGA technology. Unfortunately, the FPGA design space is very different from the VLSI design ...






Comments