skip to main content
research-article

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Authors Info & Claims
Published:04 April 2017Publication History
Skip Abstract Section

Abstract

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs.

Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs.

This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm2 area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm2, and energy efficiency of 510734 images/J.

References

  1. Stanford cs class, cs231n: Convolutional neural networks for visual recognition, 2016. URL http://cs231n.github. io/convolutional-networks/.Google ScholarGoogle Scholar
  2. Convolutional neural networks (lenet), 2016. URL http://deeplearning.net/tutorial/lenet.html# motivation.Google ScholarGoogle Scholar
  3. Nangate 45nm Open Library, Nangate Inc., 2009. URL http: //www.nangate.com/ .Google ScholarGoogle Scholar
  4. F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10):1537--1557, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Andri, L. Cavigelli, D. Rossi, and L. Benini. Yodann: An ultra-low power convolutional neural network accelerator based on binary weights. arXiv preprint arXiv:1606.05487, 2016.Google ScholarGoogle Scholar
  6. B. D. Brown and H. C. Card. Stochastic neural computation. i. computational elements. IEEE Transactions on computers, 50(9):891--905, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 609--622. IEEE Computer Society, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141--142, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Deng and D. Yu. Deep learning. Signal Processing, 7:3--4, 2014.Google ScholarGoogle Scholar
  11. S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems, pages 1117--1125, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha. Convolutional networks for fast, energy-efficient neuromorphic computing. CoRR, abs/1603.08270, 2016. URL http://arxiv.org/ abs/1603.08270 .Google ScholarGoogle Scholar
  13. B. R. Gaines. Stochastic computing. In Proceedings of the April 18--20, 1967, spring joint computer conference, pages 149--156. ACM, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528, 2016.Google ScholarGoogle Scholar
  15. M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose, and R. W. Linderman. Memristor crossbar-based neuromorphic computing system: A case study. IEEE transactions on neural networks and learning systems, 25(10):1864--1878, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  16. Y. Ji, F. Ran, C. Ma, and D. J. Lilja. A hardware implementation of a radial basis function neural network using stochastic logic. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 880--883. EDA Consortium, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  17. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236, 2015.Google ScholarGoogle Scholar
  19. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725--1732, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Kim, J. Lee, and K. Choi. Approximate de-randomizer for stochastic circuits. Proc. ISOCC, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  21. K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi. Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks. In Proceedings of the 53rd Annual Design Automation Conference, page 124. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Kim, J. Lee, and K. Choi. An energy-efficient random number generator for stochastic circuits. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 256--261. IEEE, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Larkin, A. Kinane, V. Muresan, and N. OConnor. An efficient hardware architecture for a neural network activation function generator. In International Symposium on Neural Networks, pages 1319--1327. Springer, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. László, P. Szolgay, and Z. Nagy. Analysis of a gpu based cnn implementation. In 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications, pages 1--5. IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. LeCun. Lenet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet, 2015.Google ScholarGoogle Scholar
  27. Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Li, A. Ren, Z. Li, C. Ding, B. Yuan, Q. Qiu, and Y. Wang. Towards acceleration of deep convolutional neural networks using stochastic computing. In The 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  29. Z. Li, A. Ren, J. Li, Q. Qiu, Y. Wang, and B. Yuan. Dscnn: Hardware-oriented optimization for stochastic computing based deep convolutional neural networks. In Computer Design (ICCD), 2016 IEEE 34th International Conference on, pages 678--681. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. Z. Li, A. Ren, J. Li, Q. Qiu, B. Yuan, J. Draper, and Y. Wang. Structural design optimization for deep convolutional neural networks using stochastic computing. 2017.Google ScholarGoogle Scholar
  31. M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. Design space exploration of fpga-based deep convolutional neural networks. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 575--580. IEEE, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  32. D. Neil and S.-C. Liu. Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12):2621--2628, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  33. B. Parhami and C.-H. Yeh. Accumulative parallel counters. In Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on, volume 2, pages 966--970. IEEE, 1995. Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Ren, Z. Li, Y. Wang, Q. Qiu, and B. Yuan. Designing reconfigurable large-scale deep learning systems using stochastic computing. In 2016 IEEE International Conference on Rebooting Computing . IEEE, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  35. T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran. Deep convolutional neural networks for lvcsr. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8614--8618. IEEE, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  36. S. Sato, K. Nemoto, S. Akimoto, M. Kinjo, and K. Nakajima. Implementation of a new neurochip using stochastic logic. IEEE Transactions on Neural Networks, 14(5):1122--1127, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google ScholarGoogle Scholar
  38. G. V. STOICA, R. DOGARU, and C. Stoica. High performance cuda based cnn image processor, 2015.Google ScholarGoogle Scholar
  39. E. Stromatias, D. Neil, F. Galluppi, M. Pfeiffer, S.-C. Liu, and S. Furber. Scalable energy-efficient, low-latency implementations of trained spiking deep belief networks on spinnaker. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  40. D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams. The missing memristor found. nature, 453(7191):80--83, 2008.Google ScholarGoogle Scholar
  41. M. Tanomoto, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. A cgra-based approach for accelerating convolutional neural networks. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on, pages 73--80. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. Cacti 5.3. HP Laboratories, Palo Alto, CA, 2008.Google ScholarGoogle Scholar
  43. S. Toral, J. Quero, and L. Franquelo. Stochastic pulse coded arithmetic. In Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on, volume 1, pages 599--602. IEEE, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  44. L. Xia, B. Li, T. Tang, P. Gu, X. Yin, W. Huangfu, P.-Y. Chen, S. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Yang. Mnsim: Simulation platform for memristor-based neuromorphic computing system. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 469--474. IEEE, 2016.Google ScholarGoogle Scholar
  45. B. Yuan, C. Zhang, and Z. Wang. Design space exploration for hardware-efficient stochastic computing: A case study on discrete cosine transformation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6555--6559. IEEE, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  46. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 161--170. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

                      Recommendations

                      Comments

                      Login options

                      Check if you have access through your login credentials or your institution to get full access on this article.

                      Sign in

                      Full Access

                      • Published in

                        cover image ACM SIGPLAN Notices
                        ACM SIGPLAN Notices  Volume 52, Issue 4
                        ASPLOS '17
                        April 2017
                        811 pages
                        ISSN:0362-1340
                        EISSN:1558-1160
                        DOI:10.1145/3093336
                        Issue’s Table of Contents
                        • cover image ACM Conferences
                          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
                          April 2017
                          856 pages
                          ISBN:9781450344654
                          DOI:10.1145/3037697

                        Copyright © 2017 ACM

                        Publisher

                        Association for Computing Machinery

                        New York, NY, United States

                        Publication History

                        • Published: 4 April 2017

                        Check for updates

                        Author Tags

                        Qualifiers

                        • research-article

                      PDF Format

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader
                      About Cookies On This Site

                      We use cookies to ensure that we give you the best experience on our website.

                      Learn more

                      Got it!