skip to main content
research-article

From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

Published:01 September 2009Publication History
Skip Abstract Section

Abstract

The field of high performance computing (HPC) currently abounds with excitement about the potential of a broad class of things called accelerators. And, yet, few accelerator based systems are being deployed in general purpose HPC environments. Why is that? This article explores the challenges that accelerators face in the HPC world, with a specific focus on FPGA based systems. We begin with an overview of the characteristics and challenges of typical HPC systems and applications and discuss why FPGAs have the potential to have a significant impact. The bulk of the article is focused on twelve specific areas where FPGA researchers can make contributions to hasten the adoption of FPGAs in HPC environments.

References

  1. Adiga, N., Almasi, G., and Aridor, Y., et al. 2002. An overview of the BlueGene/L supercomputer. In Proceedings of the SC Conference on High Performance Networking and Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alfke, P. 2008. Virtex-5 FXT: A new FPGA platform. Hot Chips 20.Google ScholarGoogle Scholar
  3. Balay, S., Gropp, W. D., McInnes, L. C., and Smith, B. F. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen, Eds. Birkhäuser Press, 163--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ben-Yehuda, M., Xenidis, J., Ostrowski, M., Rister, K., Bruemmer, A., and van Doorn, L. 2007. The price of safety: Evaluating iommu performance. In Proceedings of the Ottawa Linux Symposium (OLS’07). 9--20.Google ScholarGoogle Scholar
  5. Chou, Y., Pillai, P., Schmit, H., and Shen, J. P. 2000. Piperench implementation of the instruction path coprocessor. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’33). ACM, New York, 147--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ClearSpeed Technology 2008. CSX700 Processor Product Brief.Google ScholarGoogle Scholar
  7. Crawford, C. H., Henning, P., Kistler, M., and Wright, C. 2008. Accelerating computing with the Cell broadband engine processor. In Proceedings of the Conference on Computing Frontiers (CF). ACM, New York, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cray Canada, Inc. 2005a. Cray XD1 technical specifications - Release 1.4. http://www.clearspeed.com/products/document/CSX700_Product_Brief.pdf.Google ScholarGoogle Scholar
  9. Cray Canada, Inc. 2005b. Mersenne Twister Application: Cray XD1 FPGA Programming Manual release 1.2.1. http://www.cmf.nrl.navy.mil/CCS/help/pdfs/SHMEM/131_CrayXD1FPGADevelopment.pdf. Cray Canada, Inc.Google ScholarGoogle Scholar
  10. Cray Research, Inc. 1994. SHMEM Technical Note for C, SG-2516 2.3.Google ScholarGoogle Scholar
  11. Davidson, G., Cowie, J., Helmreich, S., Zacharski, R., and Boyack, K. 2006. Data-centric computing with the Netezza architecture. Tech. rep., SAND2006-1853, Sandia National Laboratories.Google ScholarGoogle Scholar
  12. deLorimier, M. and DeHon, A. 2005. Floating point sparse matrix-vector multiply for FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. El-Araby, E., Nosum, P., and El-Ghazawi, T. 2007. Productivity of high-level languages on reconfigurable computers: An HPC perspective. In Proceedings of the International Conference on Field-Programmable Technology (ICFPT’07). 257--260.Google ScholarGoogle Scholar
  14. El-Ghazawi, T. A., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V. V., and Buell, D. A. 2008. The promise of high-performance reconfigurable computing. IEEE Comput. 41, 2, 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Exegy 2008. http://www.exegy.com.Google ScholarGoogle Scholar
  16. Fahey, M. R., Alam, S., Dunigan, T. H., Vetter, J. S., and Worley, P. H. 2005. Early evaluation of the Cray XD1. In Proceedings of the Cray User Group Annual Technical Conference.Google ScholarGoogle Scholar
  17. Frigo, M. and Johnson, S. G. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Vol. 3. 1381--1384.Google ScholarGoogle Scholar
  18. Fu, W. and Compton, K. 2005. An execution environment for reconfigurable computing. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’05). 149--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gokhale, M., Cohen, J., Yoo, A., Miller, W. M., Jacob, A., Ulmer, C., and Pearce, R. 2008. Hardware technologies for high-performance data-intensive computing. IEEE Comput. 41, 60--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gokhale, M. and Graham, P. S. 2005. Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Graham, P., Nelson, B., and Hutchings, B. 2001. Instrumenting bitstreams for debugging FPGA circuits. In Proceedings of the 9th Annual Symposium on Field-Programmable Custom Computing Machines (FCCM’01). 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Greaves, D. and Singh, S. 2008. Kiwi: Synthesis of FPGA circuits from parallel programs. In Proceedings of the 16th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gschwind, M., Hofstee, H. P., Flachs, B., Hopkins, M., Watanabe, Y., and Yamazaki, T. 2006. Synergistic processing in Cell’s multicore architecture. IEEE Micro 26, 2, 10--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hemmert, K., Tripp, J. L., Hutchings, B. L., and Jackson, P. A. 2003. Source level debugger for the sea cucumber synthesizing compiler. In Proceedings of the 11th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’03). 228--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hempel, R. and Walker, D. W. 1999. The emergence of the MPI message passing standard for parallel computing. Comput. Stand. Interfaces 21, 1, 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Heroux, M. A., Bartlett, R. A., Howle, V. E., Hoekstra, R. J., Hu, J. J., Kolda, T. G., Lehoucq, R. B., Long, K. R., Pawlowski, R. P., Phipps, E. T., Salinger, A. G., Thornquist, H. K., Tuminaro, R. S., Willenbring, J. M., Williams, A., and Stanley, K. S. 2005. An overview of the Trilinos Project. ACM Trans. Math. Softw. 31, 3, 397--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kelly, S. M. and Brightwell, R. 2005. Software architecture of the light weight kernel, Catamount. In Proceedings of the Cray User Group Annual Technical Conference.Google ScholarGoogle Scholar
  28. Koehler, S., Curreri, J., and George, A. D. 2008. Performance analysis challenges and framework for high-performance reconfigurable computing. Para. Comput. 34, 4-5, 217--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lawson, C., Hanson, R., Kincaid, D., and Krough, F. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw. 5, 3, 308--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mansur, D. 2008. Stratix IV FPGA and HardCopy IV ASIC @ 40 nm. Hot Chips 20.Google ScholarGoogle Scholar
  31. Message Passing Interface Forum 1997. MPI-2: Extensions to the Message-Passing Interface. Message Passing Interface Forum. http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.Google ScholarGoogle Scholar
  32. Palaniswamy, N. 2008. Intel quickassist. http://www.intel.com/go/quickassist.Google ScholarGoogle Scholar
  33. Patel, A., Madill, C. A., Saldana, M., Comis, C., Pomes, R., and Chow, P. 2006. A scalable FPGA-based multiprocessor. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Plimpton, S. J., Pollock, R., and Stevens, M. 1997. Particle-mesh ewald and rRESPA for parallel molecular dynamics. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing.Google ScholarGoogle Scholar
  35. Poznanovic, D. 2005. Application development on the SRC computers, Inc. systems. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 78a--78a. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pozzi, L., Ienne, P., Dubach, C., and Vuletic, M. 2005. Enabling unrestricted automated synthesis of portable hardware accelerators for virtual machines. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 243--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Quinn, H. and Graham, P. 2005. Terrestrial-based radiation: A cautionary tale. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Quinn, H., Morgan, K., Graham, P., Krone, J., Caffrey, M., and Lundgreen, K. 2007. Domain crossing errors: Limitations on single device triple-modular redundancy circuits in Xilinx FPGAs. IEEE Trans. Nucl. Sci. 54, 6, 2037--2043.Google ScholarGoogle ScholarCross RefCross Ref
  39. Rodrigues, A., Murphy, R., Kogge, P., and Underwood, K. 2004. Characterizing a new class of threads in scientific applications for high end supercomputers. In Proceedings of the International Conference on Supercomputing (ICS’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rupnow, K., Rodrigues, A., Underwood, K., and Compton, K. 2006. Scientific applications vs. SPEC-FP: A comparison of program behavior. In Proceedings of the International Conference on Supercomputing (ICS’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Scrofano, R., Gokhale, M., Trouw, F., and Prasanna, V. K. 2006. A hardware/software approach to molecular dynamics on reconfigurable computers. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. SPEC 2004. http://www.spec.org.Google ScholarGoogle Scholar
  43. SRC Computers, Inc. 2007. Introduction to the SRC-7 MAPstation. SPC Computers Inc. Colorado Springs, CO.Google ScholarGoogle Scholar
  44. Stahlberg, E. 2008. OpenFPGA website. http://www.openfpga.org/.Google ScholarGoogle Scholar
  45. Sunderam, V. S. 1990. PVM: A framework for parallel distributed computing. Concurr. Pract. Exper. 2, 4, 315--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. TimeLogic 2008. http://www.timelogic.com/codequest.html.Google ScholarGoogle Scholar
  47. Underwood, K. D. and Hemmert, K. S. 2004. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Underwood, K. D., Hemmert, K. S., and Ulmer, C. 2006. Architectures and APIs: Assessing requirements for delivering FPGA performance to applications. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’06). ACM, New York, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Underwood, K. D., Levenhagen, M. J., and Brightwell, R. 2007. Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). ACM, New York, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Underwood, K. D., Sass, R. R., and Ligon, W. B. 2001. A reconfigurable extension to the network interface of Beowulf clusters. In Proceedings of the Conference on Cluster Computing. 212--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Para. Comput. 27, 1--2, 3--35.Google ScholarGoogle Scholar
  52. Woods, N. 2008. FPGA acceleration of European options pricing.Google ScholarGoogle Scholar
  53. XtremeData 2008. http://www.xtremedatainc.com.Google ScholarGoogle Scholar
  54. Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 2, Issue 4
      September 2009
      134 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/1575779
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2009
      • Accepted: 1 October 2008
      • Revised: 1 September 2008
      • Received: 1 May 2008
      Published in trets Volume 2, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!