skip to main content
research-article

A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

Published:01 February 2014Publication History
Skip Abstract Section

Abstract

Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms are highly parallelizable. Previous work on FPGAs has shown how hardware parallelism can be used to accelerate a “Restricted Boltzmann Machine” (RBM) ANN algorithm, and how to distribute computation across multiple FPGAs.

Here we describe a fully pipelined parallel architecture that exploits “mini-batch” training (combining many input cases to compute each set of weight updates) to further accelerate ANN training. We implement on an FPGA, for the first time to our knowledge, a more powerful variant of the basic RBM, the “Factored RBM” (fRBM). The fRBM has proved valuable in learning transformations and in discovering features that are present across multiple types of input. We obtain (in simulation) a 100-fold acceleration (vs. CPU software) for an fRBM having N = 256 units in each of its four groups (two input, one output, one intermediate group of units) running on a Virtex-6 LX760 FPGA. Many of the architectural features we implement are applicable not only to fRBMs, but to basic RBMs and other ANN algorithms more broadly.

References

  1. Amin, H., Curtis, K., and Hayes-Gill, B. 1997. Piecewise linear approximation applied to nonlinear function of a neural network. IEE Proc. Circ. Dev. Syst. 144, 6, 313--317.Google ScholarGoogle ScholarCross RefCross Ref
  2. Boser, B. and Sackinger, E. 2002. An analog neural network processor with programmable topology. IEEE J. Solid-State Circ. 26, 2017--2025.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dias, F., Antunes, A., and Mota, A. 2004. Artificial neural networks: A review of commercial hardware. Eng. Appl. Artif. Intell. 17, 945--952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hinton, G., Osindero, S., and Teh, Y. 2006. A fast learning algorithm for deep belief nets. Neural Computat. 18, 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Holt, J. and Baker, T. 1991. Back propagation simulations using limited precision calculations. In Proceedings of the International Joint Conference on Neural Networks. 121--126.Google ScholarGoogle Scholar
  6. Holt, J. and Hwang, J. 1993. Finite precision error analysis of neural network hardware implementations. IEEE Trans. Comput. 42, 3, 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jung, S. and Kim, S. 2007. Hardware implementation of a real-time neural network controller with a DSP and an FPGA for nonlinear systems. IEEE Trans. Indust. Elect. 54, 265--271.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kim, S., McMahon, P., and Olukotun, K. 2010. A large-scale architecture for restricted Boltzmann machines. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Larkin, D., Kinane, A., Muresan, V., and O’Connor, N. 2006. An efficient hardware architecture for a neural network activation function generator. In Proceedings of the ISNN International Symposium on Neural Networks. Vol. 144, 1319--1327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lindsey, C. and Lindblad, T. 1994. Review of hardware neural networks: A user’s perspective. In Proceedings of the 3rd Workshop on Neural Networks. 26--30.Google ScholarGoogle Scholar
  11. Ly, D. and Chow, P. 2010. High-performance reconfigurable hardware architecture for restricted Boltzmann machines. IEEE Trans. Neural Netw. 21, 1780--1792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Maeda, Y. and Tada, T. 2003. FPGA implementation of a pulse density neural network with learning ability using simultaneous perturbation. IEEE Trans. Neural Netw. 14, 688--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Memisevic, R. and Hinton, G. 2007. Unsupervised learning of image transformations. In Proceedings of the Symposium on Computer Vision and Pattern Recognition (CVPR’07).Google ScholarGoogle Scholar
  14. Memisevic, R. and Hinton, G. 2010. Learning to represent spatial transformations with factored higher-order Boltzmann machines. Neural Computat. 22, 1473--1492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Memisevic, R., Zach, C., Hinton, G., and Pollefeys, M. 2010. Gated softmax classification. Neural Inf. Proc. Syst. 23, 1603--1611.Google ScholarGoogle Scholar
  16. Oh, K. and Jung, K. 2004. GPU implementation of neural networks. Patt. Recog. 37, 1311--1314.Google ScholarGoogle ScholarCross RefCross Ref
  17. Raina, R., Madhavan, A., and Ng, A. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ranzato, M. and Hinton, G. E. 2010. Modeling pixel means and covariances using factorized third-order Boltzmann machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2551--2558.Google ScholarGoogle Scholar
  19. Tausworthe, R. 1965. Random numbers generated by linear recurrence modulo two. Math. Computat. 19, 201--219.Google ScholarGoogle ScholarCross RefCross Ref
  20. Taylor, G. and Hinton, G. 2009. Factored conditional restricted boltzmann machines for modeling motion style. In Proceedings of the 26th International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhu, J. and Sutton, P. 2003. FPGA implementations of neural networks - A survey of a decade of progress. In Proceedings of the 13th International Conference on Field-Programmable Logic and Applications. 1062--1066.Google ScholarGoogle Scholar

Index Terms

  1. A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!