skip to main content
research-article

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Published:06 February 2020Publication History
Skip Abstract Section

Abstract

Deep neural networks (DNNs) are becoming a key enabling technique for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Offloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. Although compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy.

This article presents a new, alternative approach to enable efficient execution of DNNs on embedded devices. Our approach dynamically determines which DNN to use for a given input by considering the desired accuracy and inference time. It employs machine learning to develop a low-cost predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this first by offline training a predictive model and then using the learned model to select a DNN model to use for new, unseen inputs. We apply our approach to two representative DNN domains: image classification and machine translation. We evaluate our approach on a Jetson TX2 embedded deep learning platform and consider a range of influential DNN models including convolutional and recurrent neural networks. For image classification, we achieve a 1.8x reduction in inference time with a 7.52% improvement in accuracy over the most capable single DNN model. For machine translation, we achieve a 1.34x reduction in inference time over the most capable single model with little impact on the quality of translation.

References

  1. J. J. Allaire, Dirk Eddelbuettel, Nick Golding, and Yuan Tang. 2016. TensorFlow for R. Available at https://tensorflow.rstudio.com/.Google ScholarGoogle Scholar
  2. Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, et al. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of ICML’16.Google ScholarGoogle Scholar
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.Google ScholarGoogle Scholar
  4. Jiawang Bai, Yiming Li, Jiawei Li, Yong Jiang, and Shutao Xia. 2019. Rectified decision trees: Towards interpretability, compression and empirical soundness. arxiv:1903.05965.Google ScholarGoogle Scholar
  5. Sourav Bhattacharya and Nicholas D. Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of SenSys’16.Google ScholarGoogle Scholar
  6. Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An analysis of deep neural network models for practical applications. arXiv:1605.07678.Google ScholarGoogle Scholar
  7. Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, and Zheng Wang. 2019. Characterizing scalability of sparse matrix-vector multiplications on phytium FT-2000+. International Journal of Parallel Programming. Retrieved December 14, 2019 from https://link.springer.com/article/10.1007/s10766-019-00646-x.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shizhao Chen, Jianbin Fang, Donglin Chen, Chuanfu Xu, and Zheng Wang. 2018. Adaptive optimization of sparse matrix-vector multiplication on emerging many-core architectures. In Proceedings of HPCC’18.Google ScholarGoogle ScholarCross RefCross Ref
  9. Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of ICML’15.Google ScholarGoogle Scholar
  10. Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. End-to-end deep learning of optimization heuristics. In Proceedings of PACT’17.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In Proceedings of CGO’17.Google ScholarGoogle ScholarCross RefCross Ref
  12. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of ASPLOS’14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of ICML’14.Google ScholarGoogle Scholar
  14. Yehia Elkhatib. 2015. Building cloud applications for challenged networks. In Embracing Global Computing in Emerging Economies. Communications in Computer and Information Science, Vol. 514. Springer, 1--10.Google ScholarGoogle Scholar
  15. Yehia Elkhatib, Barry Porter, Heverson B. Ribeiro, Mohamed Faten Zhani, Junaid Qadir, and Etienne Riviere. 2017. On using micro-clouds to deliver the fog. Internet Computing 21, 2 (March 2017), 8--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Murali Krishna Emani, Zheng Wang, and Michael F. P. O’Boyle. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In Proceedings of CGO’13.Google ScholarGoogle Scholar
  17. Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating diversity: A mixture of experts approach for runtime mapping in dynamic environments. In Proceedings PLDI’15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s Neural Machine Translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.Google ScholarGoogle Scholar
  19. Petko Georgiev, Souray Bhattacharya, Nicholas D. Lane, and Cecilia Mascolo. 2017. Low-resource multi-task audio sensing for mobile and embedded devices via shared deep neural network representations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technology 1, 3 (2017), Article 50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dominik Grewe, Zheng Wang, and Michael F. P. O’Boyle. 2011. A workload-aware mapping approach for data-parallel programs. In Proceedings of HiPEAC’11.Google ScholarGoogle Scholar
  21. Dominik Grewe, Zheng Wang, and Michael F. P. O’Boyle. 2013. OpenCL task partitioning in the presence of GPU contention. In Proceedings of LCPC’13.Google ScholarGoogle Scholar
  22. Michael F. P. O’Boyle, Zheng Wang, and Dominik Grewe. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In Proceedings of CGO’13.Google ScholarGoogle Scholar
  23. Tian Guo. 2017. Towards efficient deep inference for mobile applications. arXiv:1707.04610.Google ScholarGoogle Scholar
  24. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of NIPS’15.Google ScholarGoogle Scholar
  26. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of ISCA’16.Google ScholarGoogle Scholar
  27. M. Hassaballah, Aly Amin Abdelmgeid, and Hammam A. Alshazly. 2016. Image features detection, description and matching. In Image Feature Detectors and Descriptors. Studies in Computational Intelligence, Vol. 630. Springer, 11--45.Google ScholarGoogle Scholar
  28. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of CVPR’16.Google ScholarGoogle ScholarCross RefCross Ref
  29. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Proceedings of ECCV’16.Google ScholarGoogle ScholarCross RefCross Ref
  30. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.Google ScholarGoogle Scholar
  31. Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of MobiSys’17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of ICML’15.Google ScholarGoogle Scholar
  33. Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of CVPR’18.Google ScholarGoogle ScholarCross RefCross Ref
  34. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of EACL’17.Google ScholarGoogle ScholarCross RefCross Ref
  35. Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of ASPLOS’17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Anthony Khoo, Yuval Marom, and David Albrecht. 2006. Experiments with sentence classification. In Proceedings of the ALTA’06 Workshop.Google ScholarGoogle Scholar
  37. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882.Google ScholarGoogle Scholar
  38. Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. 2016. Fast Bayesian optimization of machine learning hyperparameters on large datasets. arXiv:1605.07079.Google ScholarGoogle Scholar
  39. Nicholas D. Lane and Pete Warden. 2018. The deep (learning) transformation of mobile and embedded computing. Computer 51, 5 (2018), 12--16.Google ScholarGoogle ScholarCross RefCross Ref
  40. Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and Soheil Ghiasi. 2016. CNNdroid: GPU-accelerated execution of trained deep convolutional neural networks on Android. In Proceedings of MM’16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollar. 2014. Microsoft COCO: Common objects in context. In Proceedings of ECCV’14.Google ScholarGoogle ScholarCross RefCross Ref
  42. Marco Lui. 2012. Feature stacking for sentence classification in evidence-based medicine. In Proceedings of the ALTA’12 Workshop.Google ScholarGoogle Scholar
  43. Minh-Thang Luong, Eugene Brevdo, and Rui Zhao. 2017. Neural Machine Translation (seq2seq) Tutorial. Retrieved December 14, 2019 from https://github.com/tensorflow/nmt.Google ScholarGoogle Scholar
  44. Walid Magdy, Yehia Elkhatib, Gareth Tyson, Sagar Joglekar, and Nishanth Sastry. 2017. Fake it till you make it: Fishing for catfishes. In Proceedings of ASONAM’17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vicent Sanz Marco, Ben Taylor, Barry Porter, and Zheng Wang. 2017. Improving spark application throughput via memory aware task co-location: A mixture of experts approach. In Proceedings of Middleware’17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vicent Sanz Marco, Ben Taylor, Barry Porter, and Zheng Wang. 2017. Improving spark application throughput via memory aware task co-location: A mixture of experts approach. In Proceedings of Middleware’17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mohammad Motamedi, Daniel Fong, and Soheil Ghiasi. 2017. Machine intelligence on resource-constrained IoT devices: The case of thread granularity optimization for CNN inference. ACM Transactions on Embedded Computing Systems 16, 5s (2017), Article 151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In Proceedings of LCPC’14.Google ScholarGoogle Scholar
  49. William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In Proceedings of CGO’17.Google ScholarGoogle ScholarCross RefCross Ref
  50. Seyed Ali Ossia, Ali Shahin Shamsabadi, Sina Sajadmanesh, Ali Taheri, Kleomenis Katevas, Hamid R. Rabiee, Nicholas D. Lane, and Hamed Haddadi. 2017. A hybrid deep learning architecture for privacy-preserving mobile analytics. arXiv:1703.02952.Google ScholarGoogle Scholar
  51. Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of BMVC’15.Google ScholarGoogle ScholarCross RefCross Ref
  52. Swati Rallapalli, Hang Qui, Archith John Bency, S. Karthikeyan, Ramesh Govindan, B. S. Manjunath, and Rahul Urgaonkar. 2016. Are Very Deep Neural Networks Feasible on Mobile Devices? Technical Report. University of Southern California.Google ScholarGoogle Scholar
  53. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv:1603.05279.Google ScholarGoogle Scholar
  54. Sujith Ravi. 2015. ProjectionNet: Learning efficient on-device deep networks using neural projections. arXiv:1708.00630.Google ScholarGoogle Scholar
  55. Jie Ren et al. 2017. Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach. In Proceedings of INFOCOM’17.Google ScholarGoogle Scholar
  56. Jie Ren, Ling Gao, Hai Wang, and Zheng Wang. 2018. Proteus: Network-aware web browsing on heterogeneous mobile systems. In Proceedings of CoNEXT’18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Sandra Servia Rodríguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, and Hamed Haddadi. 2017. Privacy-preserving personal model training. arXiv:1703.00380.Google ScholarGoogle Scholar
  58. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. In Proceedings of IJCV’15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Faiza Samreen et al. 2016. Daleel: Simplifying cloud instance selection using machine learning. In Proceedings of NOMS’16.Google ScholarGoogle Scholar
  60. Faiza Samreen, Yehia Elkhatib, Matthew Rowe, and Gordon S. Blair. 2019. Transferable knowledge for low-cost decision making in cloud environments. arXiv:1905.02448.Google ScholarGoogle Scholar
  61. Danielle Saunders, Felix Stahlberg, Adria de Gispert, and Bill Byrne. 2018. Multi-representation ensembles and delayed SGD updates improve syntax-based NMT. arXiv:1805.00456.Google ScholarGoogle Scholar
  62. Glenn Shafer and Vladimir Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research 9 (2008), 371--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Nathan Silberman and Sergio Guadarrama. 2013. TensorFlow-Slim Image Classification Library. Retrieved December 14, 2019 from https://github.com/tensorflow/models/tree/master/research/slim.Google ScholarGoogle Scholar
  64. Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards pervasive and user satisfactory CNN across GPU microarchitectures. In Proceedings of HPCA’17.Google ScholarGoogle ScholarCross RefCross Ref
  65. Felix Stahlberg, Adria de Gispert, and Bill Byrne. 2018. The University of Cambridge’s machine translation systems for WMT18. arXiv:1808.09465.Google ScholarGoogle Scholar
  66. Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Proceedings of NIPS’14.Google ScholarGoogle Scholar
  67. Ben Taylor, Vicent Sanz Marco, and Zheng Wang. 2017. Adaptive optimization for Open CL programs on embedded heterogeneous systems. In Proceedings of LCTES’17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Ben Taylor, Vicent Sanz Marco, Willy Wolff, Yehia Elkhatib, and Zheng Wang. 2018. Adaptive deep learning model selection on embedded systems. In Proceedings of LCTES’18. ACM, New York, NY, 31--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of ICDCS’17.Google ScholarGoogle Scholar
  70. Georgios Tournavitis, Zheng Wang, Bjorn Franke, and Michael F. P. O’Boyle. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of PLDI’09.Google ScholarGoogle Scholar
  71. EMNLP 2015 Tenth Workshop on Statistical Machine Translation. 2015. Shared Task: Machine Translation. Retrieved December 14, 2019 from https://www.statmt.org/wmt15/translation-task.html.Google ScholarGoogle Scholar
  72. Muhammad Usama, Junaid Qadir, Aunn Raza, Hunain Arif, Kok-Lim Alvin Yau, Yehia Elkhatib, Amir Hussain, and Ala Al-Fuqaha. 2017. Unsupervised machine learning for networking: Techniques, applications and research challenges. arXiv:1709.06599.Google ScholarGoogle Scholar
  73. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762.Google ScholarGoogle Scholar
  74. Zheng Wang, Dominik Grewe, and Michael F. P. O’Boyle. 2015. Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM Transactions on Architecture and Code Optimization 11, 4 (2015), Article 42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Zheng Wang, Georgios Tournavitis, Bjorn Franke, and Michael F. P. O’Boyle. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM Transactions on Architecture and Code Optimization 11, 1 (2014), Article 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Zheng Wang and Michael O’Boyle. 2018. Machine learning in compiler optimization. Proceedings of the IEEE 106, 11 (2018), 1879--1901.Google ScholarGoogle ScholarCross RefCross Ref
  77. Zheng Wang and Michael F. P. O’Boyle. 2009. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of PPoPP’09.Google ScholarGoogle Scholar
  78. Zheng Wang and Michael F. P. O’Boyle. 2010. Partitioning streaming parallelism for multi-cores: A machine learning based approach. In Proceedings of PACT’10.Google ScholarGoogle Scholar
  79. Zheng Wang and Michael F. P. O’Boyle. 2013. Using machine learning to partition streaming programs. ACM Transactions on Architecture and Code Optimization 10, 3 (2013), Article 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Jie Zhang, Zhanyong Tang, Meng Li, Dingyi Fang, Petteri Nurmi, and Zheng Wang. 2018. CrossSense: Towards cross-site and large-scale WiFi sensing. In Proceedings of MobiCom’18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, and Zheng Wang. 2018. Auto-tuning streamed applications on Intel Xeon Phi. In Proceedings of IPDPS’18.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!