Abstract
Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. To address this challenge, one promising approach is to exploit the error resilient nature of DNNs by skipping or approximating computations that have negligible impact on classification accuracy. Almost all prior efforts in this direction propose static DNN approximations by either pruning network connections, implementing computations at lower precision, or compressing weights.
In this work, we propose <u>Dy</u>namic <u>V</u>ariable <u>E</u>ffort <u>Deep</u> Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Complementary to the aforementioned static approaches, DyVEDeep is a dynamic approach that exploits heterogeneity in the DNN inputs to improve their compute efficiency with comparable classification accuracy and without requiring any re-training. DyVEDeep equips DNNs with dynamic effort mechanisms that identify computations critical to classifying a given input and focus computational effort only on the critical computations, while skipping or approximating the rest. We propose three dynamic effort mechanisms that operate at different levels of granularity viz. neuron, feature, and layer levels. We build DyVEDeep versions of six popular image recognition benchmarks (CIFAR-10, AlexNet, OverFeat, VGG-16, SqueezeNet, and Deep-Compressed-AlexNet) within the Caffe deep-learning framework. We evaluate DyVEDeep on two platforms—a high-performance server with a 2.7 GHz Intel Xeon E5-2680 processor and 128 GB memory, and a low-power Raspberry Pi board with an ARM Cortex A53 processor and 1 GB memory. Across all benchmarks, DyVEDeep achieves 2.47×--5.15× reduction in the number of scalar operations, which translates to 1.94×--2.23× and 1.46×--3.46× performance improvement over well-optimized baselines on the Xeon server and the Raspberry Pi, respectively, with comparable classification accuracy.
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Retrieved from http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.Google Scholar
Digital Library
- Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann Lecun. [n.d.] Overfeat: Integrated recognition, localization and detection using convolutional networks. Retrieved from http://arxiv.org/abs/1312.6229.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http://arxiv.org/abs/1409.1556.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255. DOI:http://dx.doi.org/10.1109/CVPRW.2009.5206848Google Scholar
Cross Ref
- Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. Retrieved from http://arxiv.org/abs/1404.5997.Google Scholar
- Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dhiraj D. Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. Retrieved from http://arxiv.org/abs/1602.06709.Google Scholar
- Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12).Google Scholar
- Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-Bit stochastic gradient descent and application to data-parallel distributed training of speech DNNs. In Proceedings of the Conference of the International Speech (Interspeech’14).Google Scholar
Cross Ref
- C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’11). 109--116. DOI:http://dx.doi.org/10.1109/CVPRW.2011.5981829Google Scholar
- Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE Computer Society, Washington, DC, 609--622. DOI:http://dx.doi.org/10.1109/MICRO.2014.58Google Scholar
Digital Library
- Norman Jouppi. [n.d.] Google supercharges machine learning tasks with custom chip. Retrieved from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.Google Scholar
- Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 13--26. DOI:http://dx.doi.org/10.1145/3079856.3080244Google Scholar
Digital Library
- Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, and Jianhua Yang. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2744769.2744900Google Scholar
Digital Library
- Shankar Ganesh Ramasubramanian, Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan. 2014. SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 15--20. DOI:http://dx.doi.org/10.1145/2627369.2627625Google Scholar
Digital Library
- Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal brain damage. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’89). 598--605. Retrieved from http://papers.nips.cc/paper/250-optimal-brain-damage.Google Scholar
- Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. Retrieved from http://arxiv.org/abs/1506.02626.Google Scholar
Digital Library
- Chao Liu, Zhiyong Zhang, and Dong Wang. 2014. Pruning deep neural networks by optimal brain damage. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14). 1092--1095. Retrieved from http://www.isca-speech.org/archive/interspeech_2014/i14_1092.html.Google Scholar
Cross Ref
- Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 27--32. DOI:http://dx.doi.org/10.1145/2627369.2627613Google Scholar
Digital Library
- Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). 1131--1135. DOI:http://dx.doi.org/10.1109/ICASSP.2015.7178146Google Scholar
Cross Ref
- Shawn Tan and Khe Chai Sim. 2016. Towards implicit complexity control using variable-depth deep neural networks for automatic speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). 5965--5969. DOI:http://dx.doi.org/10.1109/ICASSP.2016.7472822Google Scholar
Digital Library
- Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE Press, Piscataway, NJ, 662--673. DOI:http://dx.doi.org/10.1109/ISCA.2018.00061Google Scholar
Digital Library
- Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2016. Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES’16). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2968456.2968458Google Scholar
Digital Library
- H. Tann, S. Hashemi, R. I. Bahar, and S. Reda. 2017. Hardware-software codesign of accurate, multiplier-free deep neural networks. In Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). 1--6. DOI:http://dx.doi.org/10.1145/3061639.3062259Google Scholar
- Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. Retrieved from http://arxiv.org/abs/1805.06085.Google Scholar
- E. Park, D. Kim, S. Kim, Y. Kim, G. Kim, S. Yoon, and S. Yoo. 2015. Big/little deep neural network for ultra low power inference. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’15). 124--132. DOI:http://dx.doi.org/10.1109/CODESISSS.2015.7331375Google Scholar
- Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. Retrieved from http://arxiv.org/abs/1709.01686.Google Scholar
- Yoshua Bengio. 2013. Estimating or propagating gradients through stochastic neurons. Retrieved from http://arxiv.org/abs/1305.2982.Google Scholar
- Lei Jimmy Ba and Brendan J. Frey. 2013. Adaptive dropout for training deep neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’13). 3084--3092. Retrieved from http://papers.nips.cc/paper/5032-adaptive-dropout-for-training-deep-neural-networks.Google Scholar
- Swagath Venkataramani, Anand Raghunathan, Jie Liu, and Mohammed Shoaib. 2015. Scalable-effort classifiers for energy-efficient machine learning. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2744769.2744904Google Scholar
Digital Library
- Nitthilan Kannappan Jayakodi, Anwesha Chatterjee, Wonje Choi, Janardhan Rao Doppa, and Partha Pande. 2018. Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (July 2018), 1--1. DOI:http://dx.doi.org/10.1109/TCAD.2018.2857338Google Scholar
- P. Panda, A. Sengupta, and K. Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’16). 475--480.Google Scholar
- Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. Retrieved from http://arxiv.org/abs/1504.04788.Google Scholar
- Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. Retrieved from http://arxiv.org/abs/1510.00149.Google Scholar
Digital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
- C. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 331--344. DOI:http://dx.doi.org/10.1109/HPCA.2019.00048Google Scholar
- BVLC. Caffe model zoo. [n.d.]. Retrieved from https://github.com/BVLC/caffe/wiki/Model-Zoo.Google Scholar
- BVLC. Caffe cifar-10 network. [n.d.]. Retrieved from https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_quick_train_test.prototxt.Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report.Google Scholar
- Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <1 MB model size. Retrieved from http://arxiv.org/abs/1602.07360.Google Scholar
- Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. Virtualizing deep neural networks for memory-efficient neural network design. Retrieved from http://arxiv.org/abs/1602.08124.Google Scholar
- Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. Retrieved from http://arxiv.org/abs/1404.0736.Google Scholar
- Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference (BMVC’14). Retrieved from http://www.bmva.org/bmvc/2014/papers/paper073/index.html.Google Scholar
Cross Ref
- Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall F. Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814. DOI:http://dx.doi.org/10.1109/CVPR.2015.7298681Google Scholar
- Michaël Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Retrieved from http://arxiv.org/abs/1312.5851.Google Scholar
- Michael Figurnov, Dmitry P. Vetrov, and Pushmeet Kohli. 2015. PerforatedCNNs: Acceleration through elimination of redundant convolutions. Retrieved from http://arxiv.org/abs/1504.08362.Google Scholar
Index Terms
DyVEDeep: Dynamic Variable Effort Deep Neural Networks
Recommendations
Symmetric Power Activation Functions for Deep Neural Networks
LOPAL '18: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and ApplicationsCommon nonlinear activation functions with large saturation regions, like Sigmoid and Tanh, used for Deep Neural Networks (DNNs) can not guarantee useful and efficient training since they suffer from vanishing gradients problem. Rectified Linear Units ...
Efficient detection of adversarial, out-of-distribution and other misclassified samples
AbstractDeep Neural Networks (DNNs) are increasingly being considered for safety–critical approaches in which it is crucial to detect misclassified samples. Typically, detection methods are geared towards either the detection of out-of-...
Deep Elman recurrent neural networks for statistical parametric speech synthesis
Owing to the success of deep learning techniques in automatic speech recognition, deep neural networks (DNNs) have been used as acoustic models for statistical parametric speech synthesis (SPSS). DNNs do not inherently model the temporal structure in ...






Comments