Abstract
Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms are highly parallelizable. Previous work on FPGAs has shown how hardware parallelism can be used to accelerate a “Restricted Boltzmann Machine” (RBM) ANN algorithm, and how to distribute computation across multiple FPGAs.
Here we describe a fully pipelined parallel architecture that exploits “mini-batch” training (combining many input cases to compute each set of weight updates) to further accelerate ANN training. We implement on an FPGA, for the first time to our knowledge, a more powerful variant of the basic RBM, the “Factored RBM” (fRBM). The fRBM has proved valuable in learning transformations and in discovering features that are present across multiple types of input. We obtain (in simulation) a 100-fold acceleration (vs. CPU software) for an fRBM having N = 256 units in each of its four groups (two input, one output, one intermediate group of units) running on a Virtex-6 LX760 FPGA. Many of the architectural features we implement are applicable not only to fRBMs, but to basic RBMs and other ANN algorithms more broadly.
- Amin, H., Curtis, K., and Hayes-Gill, B. 1997. Piecewise linear approximation applied to nonlinear function of a neural network. IEE Proc. Circ. Dev. Syst. 144, 6, 313--317.Google Scholar
Cross Ref
- Boser, B. and Sackinger, E. 2002. An analog neural network processor with programmable topology. IEEE J. Solid-State Circ. 26, 2017--2025.Google Scholar
Cross Ref
- Dias, F., Antunes, A., and Mota, A. 2004. Artificial neural networks: A review of commercial hardware. Eng. Appl. Artif. Intell. 17, 945--952. Google Scholar
Digital Library
- Hinton, G., Osindero, S., and Teh, Y. 2006. A fast learning algorithm for deep belief nets. Neural Computat. 18, 1527--1554. Google Scholar
Digital Library
- Holt, J. and Baker, T. 1991. Back propagation simulations using limited precision calculations. In Proceedings of the International Joint Conference on Neural Networks. 121--126.Google Scholar
- Holt, J. and Hwang, J. 1993. Finite precision error analysis of neural network hardware implementations. IEEE Trans. Comput. 42, 3, 281--290. Google Scholar
Digital Library
- Jung, S. and Kim, S. 2007. Hardware implementation of a real-time neural network controller with a DSP and an FPGA for nonlinear systems. IEEE Trans. Indust. Elect. 54, 265--271.Google Scholar
Cross Ref
- Kim, S., McMahon, P., and Olukotun, K. 2010. A large-scale architecture for restricted Boltzmann machines. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 201--208. Google Scholar
Digital Library
- Larkin, D., Kinane, A., Muresan, V., and O’Connor, N. 2006. An efficient hardware architecture for a neural network activation function generator. In Proceedings of the ISNN International Symposium on Neural Networks. Vol. 144, 1319--1327. Google Scholar
Digital Library
- Lindsey, C. and Lindblad, T. 1994. Review of hardware neural networks: A user’s perspective. In Proceedings of the 3rd Workshop on Neural Networks. 26--30.Google Scholar
- Ly, D. and Chow, P. 2010. High-performance reconfigurable hardware architecture for restricted Boltzmann machines. IEEE Trans. Neural Netw. 21, 1780--1792. Google Scholar
Digital Library
- Maeda, Y. and Tada, T. 2003. FPGA implementation of a pulse density neural network with learning ability using simultaneous perturbation. IEEE Trans. Neural Netw. 14, 688--695. Google Scholar
Digital Library
- Memisevic, R. and Hinton, G. 2007. Unsupervised learning of image transformations. In Proceedings of the Symposium on Computer Vision and Pattern Recognition (CVPR’07).Google Scholar
- Memisevic, R. and Hinton, G. 2010. Learning to represent spatial transformations with factored higher-order Boltzmann machines. Neural Computat. 22, 1473--1492. Google Scholar
Digital Library
- Memisevic, R., Zach, C., Hinton, G., and Pollefeys, M. 2010. Gated softmax classification. Neural Inf. Proc. Syst. 23, 1603--1611.Google Scholar
- Oh, K. and Jung, K. 2004. GPU implementation of neural networks. Patt. Recog. 37, 1311--1314.Google Scholar
Cross Ref
- Raina, R., Madhavan, A., and Ng, A. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873--880. Google Scholar
Digital Library
- Ranzato, M. and Hinton, G. E. 2010. Modeling pixel means and covariances using factorized third-order Boltzmann machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2551--2558.Google Scholar
- Tausworthe, R. 1965. Random numbers generated by linear recurrence modulo two. Math. Computat. 19, 201--219.Google Scholar
Cross Ref
- Taylor, G. and Hinton, G. 2009. Factored conditional restricted boltzmann machines for modeling motion style. In Proceedings of the 26th International Conference on Machine Learning (ICML). Google Scholar
Digital Library
- Zhu, J. and Sutton, P. 2003. FPGA implementations of neural networks - A survey of a decade of progress. In Proceedings of the 13th International Conference on Field-Programmable Logic and Applications. 1062--1066.Google Scholar
Index Terms
A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network
Recommendations
A high-performance FPGA architecture for restricted boltzmann machines
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysDespite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented ...
Building a multi-FPGA virtualized restricted boltzmann machine architecture using embedded MPI
FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arraysSeveral FPGA architectures exist for accelerating Restricted Boltzmann Machines (RBMs). However, the network size for most is limited by the amount of available on-chip memory. Therefore, many FPGAs are required to implement very large networks for use ...
High-performance reconfigurable hardware architecture for restricted Boltzmann machines
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software ...






Comments