skip to main content
research-article
Public Access

VIBNN: Hardware Acceleration of Bayesian Neural Networks

Authors Info & Claims
Published:19 March 2018Publication History
Skip Abstract Section

Abstract

Bayesian Neural Networks (BNNs) have been proposed to address the problem of model uncertainty in training and inference. By introducing weights associated with conditioned probability distributions, BNNs are capable of resolving the overfitting issue commonly seen in conventional neural networks and allow for small-data training, through the variational inference process. Frequent usage of Gaussian random variables in this process requires a properly optimized Gaussian Random Number Generator (GRNG). The high hardware cost of conventional GRNG makes the hardware implementation of BNNs challenging. In this paper, we propose VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs. We explore the design space for massive amount of Gaussian variable sampling tasks in BNNs. Specifically, we introduce two high performance Gaussian (pseudo) random number generators: 1) the RAM-based Linear Feedback Gaussian Random Number Generator (RLF-GRNG), which is inspired by the properties of binomial distribution and linear feedback logics; and 2) the Bayesian Neural Network-oriented Wallace Gaussian Random Number Generator. To achieve high scalability and efficient memory access, we propose a deep pipelined accelerator architecture with fast execution and good hardware utilization. Experimental results demonstrate that the proposed VIBNN implementations on an FPGA can achieve throughput of 321,543.4 Images/s and energy efficiency upto 52,694.8 Images/J while maintaining similar accuracy as its software counterpart.

References

  1. Pulkit Agrawal, Ross Girshick, and Jitendra Malik. 2014. Analyzing the performance of multilayer neural networks for object recognition. In European Conference on Computer Vision. Springer, 329-344.Google ScholarGoogle ScholarCross RefCross Ref
  2. Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, and Gi-Joon Nam. 2015. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 10 (2015), 1537-1557.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R Andraka and R Phelps. 1998. An FPGA based processor yields a real time high fidelity radar environment simulator. In Military and Aerospace Applications of Programmable Devices and Technologies Conference. 220-224.Google ScholarGoogle Scholar
  4. Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan. 2003. An introduction to MCMC for machine learning. Machine learning 50, 1-2 (2003), 5-43.Google ScholarGoogle Scholar
  5. Bálint Antal and András Hajdu. 2014. An Ensemble-based System for Automatic Screening of Diabetic Retinopathy. Know.-Based Syst. 60 (April 2014), 20-27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Matias S Attene-Ramos, Nicole Miller, Ruili Huang, Sam Michael, Misha Itkin, Robert J Kavlock, Christopher P Austin, Paul Shinn, Anton Simeonov, and Raymond R Tice. 2013. The Tox21 robotic platform for the assessment of environmental chemicals?from vision to reality. Drug discovery today 18, 15 (2013), 716-723.Google ScholarGoogle Scholar
  7. JD Beasley and SG Springer. 1985. The percentage points of the normal distribution. Algorithm AS 111 (1985).Google ScholarGoogle Scholar
  8. David M Blei, Thomas L Griffiths, and Michael I Jordan. 2010. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM) 57, 2 (2010), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. George EP Box, William Gordon Hunter, and J Stuart Hunter. 1978. Statistics for experimenters: an introduction to design, data analysis, and model building. Vol. 1. JSTOR.Google ScholarGoogle Scholar
  11. Michael Braun and Jon McAuliffe. 2010. Variational inference for large-scale models of discrete choice. J. Amer. Statist. Assoc. 105, 489 (2010), 324-335.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, and Ninghui Sun. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609-622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127-138.Google ScholarGoogle ScholarCross RefCross Ref
  14. Dan C Cireşan, Alessandro Giusti, Luca M Gambardella, and Jürgen Schmidhuber. 2013. Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 411-418.Google ScholarGoogle Scholar
  15. R. David. 1980. Testing by Feedback Shift Register. IEEE Trans. Comput. 29, 7 (July 1980), 668-673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Misha Denil, Babak Shakibi, Laurent Dinh, and Nando de Freitas. 2013. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems. 2148-2156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1-15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on. IEEE, 109-116.Google ScholarGoogle ScholarCross RefCross Ref
  19. Meire Fortunato, Charles Blundell, and Oriol Vinyals. 2017. Bayesian Recurrent Neural Networks. arXiv preprint arXiv:1704.02798 (2017).Google ScholarGoogle Scholar
  20. Yarin Gal and Zoubin Ghahramani. 2015. Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv preprint arXiv:1506.02158 (2015).Google ScholarGoogle Scholar
  21. Zoubin Ghahramani and Matthew J Beal. 2001. Propagation algorithms for variational Bayesian learning. In Advances in neural information processing systems. 507-513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168 (2016).Google ScholarGoogle Scholar
  24. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243-254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  26. Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2016. VIME: Variational Information Maximizing Exploration. In Advances In Neural Information Processing Systems. 1109-1117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014).Google ScholarGoogle Scholar
  28. Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. 1999. An introduction to variational methods for graphical models. Machine learning 37, 2 (1999), 183-233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, and Al Borchers. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-Normalizing Neural Networks. arXiv preprint arXiv:1706.02515 (2017).Google ScholarGoogle Scholar
  31. Yann LeCun, Corinna Cortes, and Christopher JC Burges. 2010. MNIST hand-written digit database. AT&T Labs {Online}. Available: http://yann.lecun.com/exdb/mnist2 (2010).Google ScholarGoogle Scholar
  32. Yong Liu and Xin Yao. 1999. Ensemble learning via negative correlation. Neural networks 12, 10 (1999), 1399-1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David JC MacKay. 1992. Bayesian methods for adaptive models. Ph.D. Dissertation. California Institute of Technology.Google ScholarGoogle Scholar
  34. Jamshaid Sarwar Malik and Ahmed Hemani. 2016. Gaussian Random Number Generation: A Survey on Hardware Architectures. ACM Computing Surveys (CSUR) 49, 3 (2016), 53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. George Marsaglia and Wai Wan Tsang. 1984. A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions. SIAM Journal on scientific and statistical computing 5, 2 (1984), 349-359.Google ScholarGoogle Scholar
  36. Michael Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013).Google ScholarGoogle Scholar
  37. Mervin E Muller. 1958. An inverse method for the generation of random normal deviates on large-scale computers. Mathematical tables and other aids to computation 12, 63 (1958), 167-174.Google ScholarGoogle Scholar
  38. Mervin E Muller. 1959. A comparison of methods for generating normal deviates on digital computers. Journal of the ACM (JACM) 6, 3 (1959), 376-383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Radford M Neal. 2012. Bayesian learning for neural networks. Vol. 118. Springer Science&Business Media.Google ScholarGoogle Scholar
  40. Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, and Suchit Subhaschandra. 2017. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?. In FPGA. 5-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S Chung. 2015. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper 2, 11 (2015).Google ScholarGoogle Scholar
  42. Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 405-418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. B. E. Sakar, M. E. Isenkul, C. O. Sakar, A. Sertbas, F. Gurgen, S. Delil, H. Apaydin, and O. Kursun. 2013. Collection and Analysis of a Parkinson Speech Dataset With Multiple Types of Sound Recordings. IEEE Journal of Biomedical and Health Informatics 17, 4 (July 2013), 828-834.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85-117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  46. Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 16-25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104-3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yee Whye Teh and Michael I Jordan. 2010. Hierarchical Bayesian nonparametric models with applications. Bayesian nonparametrics 1 (2010).Google ScholarGoogle Scholar
  49. Daniel Teichroew. 1953. Distribution sampling with high speed computers. Ph.D. Dissertation. North Carolina State College.Google ScholarGoogle Scholar
  50. Jonathan L Ticknor. 2013. A Bayesian regularized artificial neural network for stock market forecasting. Expert Systems with Applications 40, 14 (2013), 5501-5506.Google ScholarGoogle ScholarCross RefCross Ref
  51. Ganesh Venkatesh, Eriko Nurvitadhi, and Debbie Marr. 2017. Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2861-2865.Google ScholarGoogle ScholarCross RefCross Ref
  52. Martin J Wainwright and Michael I Jordan. 2008. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning 1, 1-2 (2008), 1-305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Helen M Walker. 1985. De Moivre on the law of normal probability. Smith, David Eugene. A Source Book in Mathematics. Dover. ISBN 0-486-64690-4 (1985).Google ScholarGoogle Scholar
  54. Chris S Wallace. 1996. Fast pseudorandom generators for normal and exponential variates. ACM Transactions on Mathematical Software (TOMS) 22, 1 (1996), 119-127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Roy Ward and Tim Molteno. 2007. Table of linear feedback shift registers. Datasheet, Department of Physics, University of Otago (2007).Google ScholarGoogle Scholar
  56. Wei Wen, Chunpeng Wu, Yandan Wang, Kent Nixon, Qing Wu, Mark Barnell, Hai Li, and Yiran Chen. 2016. A new learning method for inference accuracy, core occupation, and performance co-optimization on TrueNorth chip. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161-170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Cha Zhang and Yunqian Ma. 2012. Ensemble machine learning: methods and applications. Springer.Google ScholarGoogle Scholar
  59. Maciej Zikeba, Jakub M Tomczak, Marek Lubicz, and Jerzy 'Swikatek. 2013. Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Applied Soft Computing (2013). https://doi.org/{WebLink}Google ScholarGoogle Scholar

Index Terms

  1. VIBNN: Hardware Acceleration of Bayesian Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 53, Issue 2
        ASPLOS '18
        February 2018
        809 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3296957
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
          March 2018
          827 pages
          ISBN:9781450349116
          DOI:10.1145/3173162

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 March 2018

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!