Abstract
With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs.
Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs.
This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm2 area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm2, and energy efficiency of 510734 images/J.
- Stanford cs class, cs231n: Convolutional neural networks for visual recognition, 2016. URL http://cs231n.github. io/convolutional-networks/.Google Scholar
- Convolutional neural networks (lenet), 2016. URL http://deeplearning.net/tutorial/lenet.html# motivation.Google Scholar
- Nangate 45nm Open Library, Nangate Inc., 2009. URL http: //www.nangate.com/ .Google Scholar
- F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10):1537--1557, 2015. Google Scholar
Digital Library
- R. Andri, L. Cavigelli, D. Rossi, and L. Benini. Yodann: An ultra-low power convolutional neural network accelerator based on binary weights. arXiv preprint arXiv:1606.05487, 2016.Google Scholar
- B. D. Brown and H. C. Card. Stochastic neural computation. i. computational elements. IEEE Transactions on computers, 50(9):891--905, 2001. Google Scholar
Digital Library
- Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 609--622. IEEE Computer Society, 2014. Google Scholar
Digital Library
- R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.Google Scholar
Digital Library
- L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141--142, 2012. Google Scholar
Cross Ref
- L. Deng and D. Yu. Deep learning. Signal Processing, 7:3--4, 2014.Google Scholar
- S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems, pages 1117--1125, 2015.Google Scholar
Digital Library
- S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha. Convolutional networks for fast, energy-efficient neuromorphic computing. CoRR, abs/1603.08270, 2016. URL http://arxiv.org/ abs/1603.08270 .Google Scholar
- B. R. Gaines. Stochastic computing. In Proceedings of the April 18--20, 1967, spring joint computer conference, pages 149--156. ACM, 1967. Google Scholar
Digital Library
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528, 2016.Google Scholar
- M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose, and R. W. Linderman. Memristor crossbar-based neuromorphic computing system: A case study. IEEE transactions on neural networks and learning systems, 25(10):1864--1878, 2014. Google Scholar
Cross Ref
- Y. Ji, F. Ran, C. Ma, and D. J. Lilja. A hardware implementation of a radial basis function neural network using stochastic logic. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 880--883. EDA Consortium, 2015. Google Scholar
Cross Ref
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014. Google Scholar
Digital Library
- P. Judd, J. Albericio, T. Hetherington, T. Aamodt, N. E. Jerger, R. Urtasun, and A. Moshovos. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236, 2015.Google Scholar
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725--1732, 2014. Google Scholar
Digital Library
- K. Kim, J. Lee, and K. Choi. Approximate de-randomizer for stochastic circuits. Proc. ISOCC, 2015. Google Scholar
Cross Ref
- K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi. Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks. In Proceedings of the 53rd Annual Design Automation Conference, page 124. ACM, 2016. Google Scholar
Digital Library
- K. Kim, J. Lee, and K. Choi. An energy-efficient random number generator for stochastic circuits. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 256--261. IEEE, 2016. Google Scholar
Cross Ref
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google Scholar
Digital Library
- D. Larkin, A. Kinane, V. Muresan, and N. OConnor. An efficient hardware architecture for a neural network activation function generator. In International Symposium on Neural Networks, pages 1319--1327. Springer, 2006.Google Scholar
Digital Library
- E. László, P. Szolgay, and Z. Nagy. Analysis of a gpu based cnn implementation. In 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications, pages 1--5. IEEE, 2012.Google Scholar
Cross Ref
- Y. LeCun. Lenet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet, 2015.Google Scholar
- Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015. Google Scholar
Cross Ref
- J. Li, A. Ren, Z. Li, C. Ding, B. Yuan, Q. Qiu, and Y. Wang. Towards acceleration of deep convolutional neural networks using stochastic computing. In The 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2017. Google Scholar
Cross Ref
- Z. Li, A. Ren, J. Li, Q. Qiu, Y. Wang, and B. Yuan. Dscnn: Hardware-oriented optimization for stochastic computing based deep convolutional neural networks. In Computer Design (ICCD), 2016 IEEE 34th International Conference on, pages 678--681. IEEE, 2016.Google Scholar
Cross Ref
- Z. Li, A. Ren, J. Li, Q. Qiu, B. Yuan, J. Draper, and Y. Wang. Structural design optimization for deep convolutional neural networks using stochastic computing. 2017.Google Scholar
- M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. Design space exploration of fpga-based deep convolutional neural networks. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 575--580. IEEE, 2016. Google Scholar
Cross Ref
- D. Neil and S.-C. Liu. Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12):2621--2628, 2014.Google Scholar
Cross Ref
- B. Parhami and C.-H. Yeh. Accumulative parallel counters. In Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on, volume 2, pages 966--970. IEEE, 1995. Google Scholar
Cross Ref
- A. Ren, Z. Li, Y. Wang, Q. Qiu, and B. Yuan. Designing reconfigurable large-scale deep learning systems using stochastic computing. In 2016 IEEE International Conference on Rebooting Computing . IEEE, 2016. Google Scholar
Cross Ref
- T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran. Deep convolutional neural networks for lvcsr. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8614--8618. IEEE, 2013. Google Scholar
Cross Ref
- S. Sato, K. Nemoto, S. Akimoto, M. Kinjo, and K. Nakajima. Implementation of a new neurochip using stochastic logic. IEEE Transactions on Neural Networks, 14(5):1122--1127, 2003. Google Scholar
Digital Library
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- G. V. STOICA, R. DOGARU, and C. Stoica. High performance cuda based cnn image processor, 2015.Google Scholar
- E. Stromatias, D. Neil, F. Galluppi, M. Pfeiffer, S.-C. Liu, and S. Furber. Scalable energy-efficient, low-latency implementations of trained spiking deep belief networks on spinnaker. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2015. Google Scholar
Cross Ref
- D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams. The missing memristor found. nature, 453(7191):80--83, 2008.Google Scholar
- M. Tanomoto, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. A cgra-based approach for accelerating convolutional neural networks. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on, pages 73--80. IEEE, 2015. Google Scholar
Digital Library
- S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. Cacti 5.3. HP Laboratories, Palo Alto, CA, 2008.Google Scholar
- S. Toral, J. Quero, and L. Franquelo. Stochastic pulse coded arithmetic. In Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on, volume 1, pages 599--602. IEEE, 2000. Google Scholar
Cross Ref
- L. Xia, B. Li, T. Tang, P. Gu, X. Yin, W. Huangfu, P.-Y. Chen, S. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Yang. Mnsim: Simulation platform for memristor-based neuromorphic computing system. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 469--474. IEEE, 2016.Google Scholar
- B. Yuan, C. Zhang, and Z. Wang. Design space exploration for hardware-efficient stochastic computing: A case study on discrete cosine transformation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6555--6559. IEEE, 2016. Google Scholar
Cross Ref
- C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 161--170. ACM, 2015. Google Scholar
Digital Library
Index Terms
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
Recommendations
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsWith the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-...
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
Asplos'17With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-...
<sc>PermCNN</sc>: Energy-Efficient Convolutional Neural Network Hardware Architecture With Permuted Diagonal Structure
In the emerging artificial intelligence (AI) era, efficient hardware accelerator design for deep neural networks (DNNs) is very important to enable real-time energy-efficient DNN model deployment. To this end, various DNN model compression approaches and ...







Comments