skip to main content
research-article

Portable and scalable FPGA-based acceleration of a direct linear system solver

Published:23 March 2012Publication History
Skip Abstract Section

Abstract

FPGAs have the potential to serve as a platform for accelerating many computations including scientific applications. However, the large development cost and short life span for FPGA designs have limited their adoption by the scientific computing community. FPGA-based scientific computing and many kinds of embedded computing could become more practical if there were hardware libraries that were portable to any FPGA-based system with performance that scaled with the size of the FPGA. To illustrate this idea we have implemented one common super-computing library function: the LU factorization method for solving systems of linear equations. This paper describes a method for making the design both portable and scalable that should be illustrative if such libraries are to be built in the future. The design is a software-based generator that leverages both the flexibility of a software programming language and the parameters inherent in an hardware description language. The generator accepts parameters that describe the FPGA capacity and external memory capabilities. We compare the performance of our engine executing on the largest FPGA available at the time of this work (an Altera Stratix III 3S340) to a single processor core fabricated in the same 65nm IC process running a highly optimized software implementation from the processor vendor. For single precision matrices on the order of 10,000 × 10,000 elements, the FPGA implementation is 2.2 times faster and the energy dissipated per useful GFLOP operation is a factor of 5 times less. For double precision, the FPGA implementation is 1.7 times faster and 3.5 times more energy efficient.

References

  1. Agility Design Solutions, Inc. 2008. Handel-c. http://www.agilityds.com/products/c_based_products/dk_design_suite/handel-c.aspx.Google ScholarGoogle Scholar
  2. Altera. 2008. Netlist optimizations and physical synthesis. Tech. rep., Altera Corporation. http://www.altera.com/literature/hb/qts/qts_qii52007.pdf.Google ScholarGoogle Scholar
  3. Altera Corporation. 2008. Intellectual property solutions. http://www.altera.com/products/ip/ipm- index.html.Google ScholarGoogle Scholar
  4. AutoESL. 2008. Auto pilot synthesis tool. http://www.autoesl.com/.Google ScholarGoogle Scholar
  5. Beauchamp, M. J., Hauck, S., Underwood, K. D., and Hemmert, K. S. 2006. Embedded floating-point units in FPGAs. In Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays (FPGA'06). ACM, New York, NY, 12--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blackford, L. S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., and Whaley, R. C. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28, 2, 135--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cray Inc. 2008. http://www.cray.com.Google ScholarGoogle Scholar
  8. Daga, V., Govindu, G., Gangadharpalli, S., Sridhar, V., and Prasanna, V. K. 2004. Efficient floating-point based block LU decomposition on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigureable Systems and Algorithms.Google ScholarGoogle Scholar
  9. deLorimier, M. and DeHon, A. 2005. Floating-point sparse matrix-vector multiply for FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays. 75--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Diersch, H. J. G. 2008. Error norm. http://www1.wasy.de/deutsch/produkte/feflow/hilfe/general/theory/whitepapers/error_norms/enornorm.html.Google ScholarGoogle Scholar
  11. Dongarra, J. J., Duff, I. S., Sorensen, D. C., and van der Vorst, H. A. 1998. Numerical Linear Algebra for High-Performance Computers. SIAM, Philadelphia, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hager, W. W. 1988. Applied Numerical Linear Algebra. Prentice Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  13. Intel. 2008. Intel math kernel library. http://www.intel.com/cd/software/products/asmo-na/eng/307757.htm.Google ScholarGoogle Scholar
  14. Intel Corporation. 2008. Intel Xeon processor 5160. http://processorfinder.intel.com/Details.aspx?sSpec=SLABS.Google ScholarGoogle Scholar
  15. Liang, X. and Jean, J. S.-N. 2003. Mapping of generalized template matching onto reconfigurable computers. In IEEE Trans VLSI Syst. 167--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lopes, A. R. and Constantinides, G. A. 2008. A high throughput FPGA-based floating point conjugate gradient implementation. In Proceedings of the 4th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications. Lecture Notes in Computer Science Vol. 4943, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mencer, O., Morf, M., and Flynn, M. J. 1998. PAM-Blox: High performance FPGA design for adaptive computing. In Proceedings of the 6th Annual IEEE Symposium on FPGAs for Custom Computing Machines. 485--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mentor Graphics. 2008. Catapult synthesis. http://www.mentor.com/products/esl/high_level_synthesis/catapult_synthesis/index.cfm.Google ScholarGoogle Scholar
  19. Moore, N., Conti, A., Leeser, M., and King, L. S. 2007. Vforce: An extensible framework for reconfigurable supercomputing. Comput. 40, 39--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Morris, G. R. and Prasanna, V. K. 2007. Sparse matrix computations on reconfigurable hardware. Comput. 40, 3, 58--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. NVIDIA Corporation. 2011. Geforce gtx 280. http://www.nvidia.com/object/product_geforce_gtx_280_us.html.Google ScholarGoogle Scholar
  22. SRC Computers. 2008. http://www.srccomp.com.Google ScholarGoogle Scholar
  23. Sun, J., Peterson, G. D., and Storaasli, O. O. 2008. High-performance mixed-precision linear solver for fpgas. IEEE Trans. Comput. 57, 1614--1623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Volkov, V. and Demmel, J. W. 2008. Benchmarking gpus to tune dense linear algebra. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08). IEEE Press, Los Alamitos, CA, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. XtremeData, Inc. 2008. http://www.xtremedatainc.com.Google ScholarGoogle Scholar
  26. Zhang, W. 2008. Portable and scalable FPGA-based acceleration of a direct linear system solver. M.A.Sc. Thesis, University of Toronto.Google ScholarGoogle Scholar
  27. Zhang, W., Betz, V., and Rose, J. 2008. Portable and scalable FPGA-based acceleration of a direct linear system solver. In Proceedings of the International Conference on Field-Programmable Technology. 17--24.Google ScholarGoogle Scholar
  28. Zhuo, L. and Prasanna, V. 2008. High-performance designs for linear algebra operations on reconfigurable hardware. IEEE Trans. Comput. 57, 8, 1057--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhuo, L. and Prasanna, V. K. 2006. High-performance and parameterized matrix factorization on FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. 1--6.Google ScholarGoogle Scholar

Index Terms

  1. Portable and scalable FPGA-based acceleration of a direct linear system solver

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 5, Issue 1
        March 2012
        148 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/2133352
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 March 2012
        • Accepted: 1 July 2011
        • Revised: 1 January 2011
        • Received: 1 October 2010
        Published in trets Volume 5, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!