skip to main content
research-article

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Published:11 June 2020Publication History
Skip Abstract Section

Abstract

Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. To address this challenge, one promising approach is to exploit the error resilient nature of DNNs by skipping or approximating computations that have negligible impact on classification accuracy. Almost all prior efforts in this direction propose static DNN approximations by either pruning network connections, implementing computations at lower precision, or compressing weights.

In this work, we propose <u>Dy</u>namic <u>V</u>ariable <u>E</u>ffort <u>Deep</u> Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Complementary to the aforementioned static approaches, DyVEDeep is a dynamic approach that exploits heterogeneity in the DNN inputs to improve their compute efficiency with comparable classification accuracy and without requiring any re-training. DyVEDeep equips DNNs with dynamic effort mechanisms that identify computations critical to classifying a given input and focus computational effort only on the critical computations, while skipping or approximating the rest. We propose three dynamic effort mechanisms that operate at different levels of granularity viz. neuron, feature, and layer levels. We build DyVEDeep versions of six popular image recognition benchmarks (CIFAR-10, AlexNet, OverFeat, VGG-16, SqueezeNet, and Deep-Compressed-AlexNet) within the Caffe deep-learning framework. We evaluate DyVEDeep on two platforms—a high-performance server with a 2.7 GHz Intel Xeon E5-2680 processor and 128 GB memory, and a low-power Raspberry Pi board with an ARM Cortex A53 processor and 1 GB memory. Across all benchmarks, DyVEDeep achieves 2.47×--5.15× reduction in the number of scalar operations, which translates to 1.94×--2.23× and 1.46×--3.46× performance improvement over well-optimized baselines on the Xeon server and the Raspberry Pi, respectively, with comparable classification accuracy.

References

  1. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Retrieved from http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann Lecun. [n.d.] Overfeat: Integrated recognition, localization and detection using convolutional networks. Retrieved from http://arxiv.org/abs/1312.6229.Google ScholarGoogle Scholar
  3. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http://arxiv.org/abs/1409.1556.Google ScholarGoogle Scholar
  4. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255. DOI:http://dx.doi.org/10.1109/CVPRW.2009.5206848Google ScholarGoogle ScholarCross RefCross Ref
  5. Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. Retrieved from http://arxiv.org/abs/1404.5997.Google ScholarGoogle Scholar
  6. Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dhiraj D. Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. Retrieved from http://arxiv.org/abs/1602.06709.Google ScholarGoogle Scholar
  7. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12).Google ScholarGoogle Scholar
  8. Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-Bit stochastic gradient descent and application to data-parallel distributed training of speech DNNs. In Proceedings of the Conference of the International Speech (Interspeech’14).Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’11). 109--116. DOI:http://dx.doi.org/10.1109/CVPRW.2011.5981829Google ScholarGoogle Scholar
  10. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE Computer Society, Washington, DC, 609--622. DOI:http://dx.doi.org/10.1109/MICRO.2014.58Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Norman Jouppi. [n.d.] Google supercharges machine learning tasks with custom chip. Retrieved from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.Google ScholarGoogle Scholar
  12. Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 13--26. DOI:http://dx.doi.org/10.1145/3079856.3080244Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, and Jianhua Yang. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2744769.2744900Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shankar Ganesh Ramasubramanian, Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan. 2014. SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 15--20. DOI:http://dx.doi.org/10.1145/2627369.2627625Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal brain damage. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’89). 598--605. Retrieved from http://papers.nips.cc/paper/250-optimal-brain-damage.Google ScholarGoogle Scholar
  16. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. Retrieved from http://arxiv.org/abs/1506.02626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chao Liu, Zhiyong Zhang, and Dong Wang. 2014. Pruning deep neural networks by optimal brain damage. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14). 1092--1095. Retrieved from http://www.isca-speech.org/archive/interspeech_2014/i14_1092.html.Google ScholarGoogle ScholarCross RefCross Ref
  18. Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 27--32. DOI:http://dx.doi.org/10.1145/2627369.2627613Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). 1131--1135. DOI:http://dx.doi.org/10.1109/ICASSP.2015.7178146Google ScholarGoogle ScholarCross RefCross Ref
  20. Shawn Tan and Khe Chai Sim. 2016. Towards implicit complexity control using variable-depth deep neural networks for automatic speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). 5965--5969. DOI:http://dx.doi.org/10.1109/ICASSP.2016.7472822Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE Press, Piscataway, NJ, 662--673. DOI:http://dx.doi.org/10.1109/ISCA.2018.00061Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2016. Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES’16). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2968456.2968458Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Tann, S. Hashemi, R. I. Bahar, and S. Reda. 2017. Hardware-software codesign of accurate, multiplier-free deep neural networks. In Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). 1--6. DOI:http://dx.doi.org/10.1145/3061639.3062259Google ScholarGoogle Scholar
  24. Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. Retrieved from http://arxiv.org/abs/1805.06085.Google ScholarGoogle Scholar
  25. E. Park, D. Kim, S. Kim, Y. Kim, G. Kim, S. Yoon, and S. Yoo. 2015. Big/little deep neural network for ultra low power inference. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’15). 124--132. DOI:http://dx.doi.org/10.1109/CODESISSS.2015.7331375Google ScholarGoogle Scholar
  26. Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. Retrieved from http://arxiv.org/abs/1709.01686.Google ScholarGoogle Scholar
  27. Yoshua Bengio. 2013. Estimating or propagating gradients through stochastic neurons. Retrieved from http://arxiv.org/abs/1305.2982.Google ScholarGoogle Scholar
  28. Lei Jimmy Ba and Brendan J. Frey. 2013. Adaptive dropout for training deep neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’13). 3084--3092. Retrieved from http://papers.nips.cc/paper/5032-adaptive-dropout-for-training-deep-neural-networks.Google ScholarGoogle Scholar
  29. Swagath Venkataramani, Anand Raghunathan, Jie Liu, and Mohammed Shoaib. 2015. Scalable-effort classifiers for energy-efficient machine learning. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY. DOI:http://dx.doi.org/10.1145/2744769.2744904Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nitthilan Kannappan Jayakodi, Anwesha Chatterjee, Wonje Choi, Janardhan Rao Doppa, and Partha Pande. 2018. Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (July 2018), 1--1. DOI:http://dx.doi.org/10.1109/TCAD.2018.2857338Google ScholarGoogle Scholar
  31. P. Panda, A. Sengupta, and K. Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’16). 475--480.Google ScholarGoogle Scholar
  32. Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. Retrieved from http://arxiv.org/abs/1504.04788.Google ScholarGoogle Scholar
  33. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. Retrieved from http://arxiv.org/abs/1510.00149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google ScholarGoogle Scholar
  35. C. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 331--344. DOI:http://dx.doi.org/10.1109/HPCA.2019.00048Google ScholarGoogle Scholar
  36. BVLC. Caffe model zoo. [n.d.]. Retrieved from https://github.com/BVLC/caffe/wiki/Model-Zoo.Google ScholarGoogle Scholar
  37. BVLC. Caffe cifar-10 network. [n.d.]. Retrieved from https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_quick_train_test.prototxt.Google ScholarGoogle Scholar
  38. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report.Google ScholarGoogle Scholar
  39. Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and &lt;1 MB model size. Retrieved from http://arxiv.org/abs/1602.07360.Google ScholarGoogle Scholar
  40. Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. Virtualizing deep neural networks for memory-efficient neural network design. Retrieved from http://arxiv.org/abs/1602.08124.Google ScholarGoogle Scholar
  41. Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. Retrieved from http://arxiv.org/abs/1404.0736.Google ScholarGoogle Scholar
  42. Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference (BMVC’14). Retrieved from http://www.bmva.org/bmvc/2014/papers/paper073/index.html.Google ScholarGoogle ScholarCross RefCross Ref
  43. Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall F. Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814. DOI:http://dx.doi.org/10.1109/CVPR.2015.7298681Google ScholarGoogle Scholar
  44. Michaël Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Retrieved from http://arxiv.org/abs/1312.5851.Google ScholarGoogle Scholar
  45. Michael Figurnov, Dmitry P. Vetrov, and Pushmeet Kohli. 2015. PerforatedCNNs: Acceleration through elimination of redundant convolutions. Retrieved from http://arxiv.org/abs/1504.08362.Google ScholarGoogle Scholar

Index Terms

  1. DyVEDeep: Dynamic Variable Effort Deep Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!