skip to main content
research-article

DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device

Published:18 October 2022Publication History
Skip Abstract Section

Abstract

Recently, there has been an explosive growth of mobile and embedded applications using convolutional neural networks (CNNs). To alleviate their excessive computational demands, developers have traditionally resorted to cloud offloading, inducing high infrastructure costs and a strong dependence on networking conditions. On the other end, the emergence of powerful SoCs is gradually enabling on-device execution. Nonetheless, low- and mid-tier platforms still struggle to run state-of-the-art CNNs sufficiently. In this article, we present DynO, a distributed inference framework that combines the best of both worlds to address several challenges, such as device heterogeneity, varying bandwidth, and multi-objective requirements. Key components that enable this are its novel CNN-specific data packing method, which exploits the variability of precision needs in different parts of the CNN when onloading computation, and its novel scheduler, which jointly tunes the partition point and transferred data precision at runtime to adapt inference to its execution environment. Quantitative evaluation shows that DynO outperforms the current state of the art, improving throughput by over an order of magnitude over device-only execution and up to 7.9× over competing CNN offloading systems, with up to 60× less data transferred.

REFERENCES

  1. [1] Almeida Mario, Laskaridis Stefanos, Leontiadis Ilias, Venieris Stylianos I., and Lane Nicholas D.. 2019. EmBench: Quantifying performance variations of deep neural networks across modern commodity devices. In 3rd International Workshop on Deep Learning for Mobile Systems and Applications (EMDL’19). ACM, 16.Google ScholarGoogle Scholar
  2. [2] Almeida Mario, Laskaridis Stefanos, Mehrotra Abhinav, Dudziak Lukasz, Leontiadis Ilias, and Lane Nicholas D.. 2021. Smart at what cost? Characterising mobile deep neural networks in the wild. In Proceedings of the 21st ACM Internet Measurement Conference (IMC’21). Association for Computing Machinery, New York, NY, 658672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Antonini Mattia, Vu Tran Huy, Min Chulhong, Montanari Alessandro, Mathur Akhil, and Kawsar Fahim. 2019. Resource characterisation of personal-scale sensing models on edge accelerators. In Proceedings of the 1st International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT’19). ACM, 4955.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Blalock Davis, Ortiz Jose Javier Gonzalez, Frankle Jonathan, and Guttag John. 2020. What is the state of neural network pruning? In Proceedings of Machine Learning and Systems (MLSys’20). Vol. 2. 129146.Google ScholarGoogle Scholar
  5. [5] Burgess J.. 2019. RTX ON – The Nvidia Turing GPU. In 2019 IEEE Hot Chips 31 Symposium (HCS’19). 127. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Cartas Alejandro, Kocour Martin, Raman Aravindh, Leontiadis Ilias, Luque Jordi, Sastry Nishanth, Nuñez-Martinez Jose, Perino Diego, and Segura Carlos. 2019. A reality check on inference at mobile networks edge. In Proceedings of the 2nd International Workshop on Edge Systems, Analytics and Networking (EdgeSys’19). 5459.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chun Byung-Gon, Ihm Sunghwan, Maniatis Petros, Naik Mayur, and Patti Ashwin. 2011. CloneCloud: Elastic execution between mobile device and cloud. In Proceedings of the 6th Conference on Computer Systems (EuroSys’11). 301314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Chung E. et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 820.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Cuervo Eduardo, Balasubramanian Aruna, Cho Dae-ki, Wolman Alec, Saroiu Stefan, Chandra Ranveer, and Bahl Paramvir. 2010. MAUI: Making smartphones last longer with code offload. In International Conference on Mobile Systems, Applications, and Services (MobiSys’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Dey Rahul and Salem Fathi M.. 2017. Gate-Variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS’17). IEEE, 15971600.Google ScholarGoogle Scholar
  11. [11] Fang Zhou, Lin Jeng-Hau, Srivastava Mani B., and Gupta Rajesh K.. 2019. Multi-tenant mobile offloading systems for real-time computer vision applications. In Proceedings of the 20th International Conference on Distributed Computing and Networking (ICDCN’19). 2130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Georgiadis Georgios. 2019. Accelerating convolutional neural networks via activation map compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Gordon Mark S., Jamshidi D. Anoushe, Mahlke Scott, Mao Z. Morley, and Chen Xu. 2012. COMET: Code Offload by Migrating Execution Transparently. In USENIX Conference on Operating Systems Design and Implementation (OSDI’12).Google ScholarGoogle Scholar
  14. [14] Guo K., Sui L., Qiu J., Yu J., Wang J., Yao S., Han S., Wang Y., and Yang H.. 2018. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 1 (2018), 3547.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Gysel P., Pimentel J., Motamedi M., and Ghiasi S.. 2018. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 29, 11 (2018), 57845789.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  17. [17] Han Seungyeop, Shen Haichen, Philipose Matthai, Agarwal Sharad, Wolman Alec, and Krishnamurthy Arvind. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Hazelwood K., Bird S., Brooks D., Chintala S., Diril U., Dzhulgakov D., Fawzy M., Jia B., Jia Y., Kalro A., Law J., Lee K., Lu J., Noordhuis P., Smelyanskiy M., Xiong L., and Wang X.. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). 620629.Google ScholarGoogle Scholar
  19. [19] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision (ICCV’15). 10261034.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] He K., Zhang X., Ren S., and Sun J.. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770778.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’19). 1423–1431.Google ScholarGoogle Scholar
  23. [23] Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. 2020. CLIO: Enabling automatic compilation of deep learning pipelines across IoT and Cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom’20).Google ScholarGoogle Scholar
  24. [24] Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI benchmark: All about deep learning on smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).Google ScholarGoogle Scholar
  25. [25] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Jayakodi Nitthilan Kanappan, Belakaria Syrine, Deshwal Aryan, and Doppa Janardhan Rao. 2020. Design and optimization of energy-accuracy tradeoff networks for mobile platforms via pretrained deep models. ACM Transactions on Embedded Computing Systems (TECS) 19, 1, Article 4 (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Jayakodi Nitthilan Kannappan, Chatterjee Anwesha, Choi Wonje, Doppa Janardhan Rao, and Pande Partha Pratim. 2018. Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 11 (2018), 28812893.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Jayakodi Nitthilan Kanappan, Doppa Janardhan Rao, and Pande Partha Pratim. 2020. SETGAN: Scale and energy trade-off GANs for image applications on mobile platforms. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD’20). 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Jeong Hyuk-Jin, Lee Hyeon-Jae, Shin Chang Hyun, and Moon Soo-Mook. 2018. IONN: Incremental offloading of neural network computations from mobile devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, 401411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Jouppi Norman P. et al. 2017. In-datacenter performance analysis of a tensor processing unit. In International Symposium on Computer Architecture (ISCA’17).Google ScholarGoogle Scholar
  31. [31] Kang Yiping, Hauswald Johann, Gao Cao, Rovinski Austin, Mudge Trevor, Mars Jason, and Tang Lingjia. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17), 615629.Google ScholarGoogle Scholar
  32. [32] Kouris A. and Bouganis C.. 2018. Learning to fly by myself: A self-supervised CNN-based approach for autonomous navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18). 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Kouris A., Venieris S. I., and Bouganis C.. 2018. CascadeCNN: Pushing the performance limits of quantisation in convolutional neural networks. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL’18). 1551557.Google ScholarGoogle Scholar
  34. [34] Kouris Alexandros, Venieris Stylianos I., Laskaridis Stefanos, and Lane Nicholas D.. 2021. Multi-exit semantic segmentation networks. In arXiv.Google ScholarGoogle Scholar
  35. [35] Kukkala V. K., Tunnell J., Pasricha S., and Bradley T.. 2018. Advanced driver-assistance systems: A path toward autonomous vehicles. IEEE Consumer Electronics Magazine 7, 5 (2018), 1825.Google ScholarGoogle Scholar
  36. [36] Laskaridis Stefanos, Kouris Alexandros, and Lane Nicholas D.. 2021. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning (EMDL’21). 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Laskaridis Stefanos, Venieris Stylianos I., Almeida Mario, Leontiadis Ilias, and Lane Nicholas D.. 2020. SPINN: Synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom’20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Laskaridis Stefanos, Venieris Stylianos I., Kim Hyeji, and Lane Nicholas D.. 2020. HAPI: Hardware-aware progressive inference. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD’20). 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Lee Taegyeong, Lin Zhiqi, Pushp Saumay, Li Caihua, Liu Yunxin, Lee Youngki, Xu Fengyuan, Xu Chenren, Zhang Lintao, and Song Junehwa. 2019. Occlumency: Privacy-preserving remote deep-learning inference using SGX. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Li En, Zeng Liekang, Zhou Zhi, and Chen Xu. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications (TWC) 19, 1 (2019), 447457.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Li Hongshan, Hu Chenghao, Jiang Jingyan, Wang Zhi, Wen Yonggang, and Zhu Wenwu. 2019. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In International Conference on Parallel and Distributed Systems (ICPADS’19). 671678.Google ScholarGoogle Scholar
  42. [42] Mao Jiachen, Chen Xiang, Nixon Kent W., Krieger Christopher, and Chen Yiran. 2017. MoDNN: Local distributed mobile computing system for Deep Neural Network. In Design, Automation and Test in Europe (DATE’17).Google ScholarGoogle Scholar
  43. [43] Mao Jiachen, Yang Zhongda, Wen Wei, Wu Chunpeng, Song Linghao, Nixon Kent W., Chen Xiang, Li Hai, and Chen Yiran. 2017. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs. In International Conference on Computer-Aided Design (ICCAD’17). 751756.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Masui Kiyoshi, Amiri Mandana, Connor Liam, Deng Meiling, Fandino Mateus, Höfer Carolin, Halpern Mark, Hanna David, Hincks Adam D., Hinshaw Gary, et al. 2015. A compression scheme for radio data in high performance computing. Astronomy and Computing 12 (2015), 181190.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Mo Fan, Shamsabadi Ali Shahin, Katevas Kleomenis, Demetriou Soteris, Leontiadis Ilias, Cavallaro Andrea, and Haddadi Hamed. 2020. DarkneTZ: Towards model privacy at the edge using trusted execution environments. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (MobiSys’20). 161174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Nair Vinod and Hinton Geoffrey E.. 2010. Rectified linear units improve restricted Boltzmann machines. In International Conference on Machine Learning (ICML’10). 807814.Google ScholarGoogle Scholar
  47. [47] Nikolić M., Mahmoud M., and Moshovos A.. 2018. Characterizing sources of ineffectual computations in deep learning networks. In 2018 IEEE International Symposium on Workload Characterization (IISWC’18). 8687.Google ScholarGoogle Scholar
  48. [48] Ofcom. 2014. 3G and 4G Network Speeds. https://www.ofcom.org.uk/about-ofcom/latest/media/media-releases/2014/3g-4g-bb-speeds.Google ScholarGoogle Scholar
  49. [49] Panda Priyadarshini, Sengupta Abhronil, and Roy Kaushik. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE’16). IEEE, 475480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] PyTorch. 2021. Dynamic Quantization. Retrieved 2022/10/14 12:38:53, from https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html.Google ScholarGoogle Scholar
  51. [51] Ramachandran Prajit, Zoph Barret, and Le Quoc V.. 2018. Searching for activation functions. In International Conference on Learning Representations (ICLR) Workshops.Google ScholarGoogle Scholar
  52. [52] Rhu M., O’Connor M., Chatterjee N., Pool J., Kwon Y., and Keckler S. W.. 2018. Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. In International Symposium on High Performance Computer Architecture (HPCA’18). 7891.Google ScholarGoogle Scholar
  53. [53] Ryan Mark D.. 2011. Cloud computing privacy concerns on our doorstep. Communications of the ACM 54, 1 (2011), 3638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Simonyan K. and Zisserman A.. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  56. [56] Sze V., Chen Y., Yang T., and Emer J. S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 22952329.Google ScholarGoogle Scholar
  57. [57] Szegedy Christian, Ioffe Sergey, Vanhoucke Vincent, and Alemi Alexander. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Talukder Asoke K., Zimmerman Lawrence, et al. 2010. Cloud economics: Principles, costs, and benefits. In Cloud Computing. Springer, 343360.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Wu C., Brooks D., Chen K., Chen D., Choudhury S., Dukhan M., Hazelwood K., Isaac E., Jia Y., Jia B., Leyvand T., Lu H., Lu Y., Qiao L., Reagen B., Spisak J., Sun F., Tulloch A., Vajda P., Wang X., Wang Y., Wasti B., Wu Y., Xian R., Yoo S., and Zhang P.. 2019. Machine learning at Facebook: Understanding inference at the edge. In IEEE International Symposium on High Performance Computer Architecture (HPCA’19).Google ScholarGoogle Scholar
  60. [60] Xiaowei Xu, Yukun Ding, Sharon Xiaobo Hu, Michael Niemier, Jason Cong, Yu Hu, and Yiyu Shi. 2018. Scaling for edge inference of deep neural networks. Nature Electronics 1 (2018), 216–222.Google ScholarGoogle Scholar
  61. [61] Yao Shuochao, Hu Shaohan, Zhao Yiran, Zhang Aston, and Abdelzaher Tarek. 2017. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). 351360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Zhang Yuan, Liu Hao, Jiao Lei, and Fu Xiaoming. 2012. To offload or not to offload: An efficient code partition algorithm for mobile cloud computing. In 2012 IEEE 1st International Conference on Cloud Networking (CLOUDNET’12). 8086.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zhang Minjia, Rajbhandari Samyam, Wang Wenhan, and He Yuxiong. 2018. DeepCPU: Serving RNN-based deep learning models 10x faster. In 2018 USENIX Annual Technical Conference. 951965.Google ScholarGoogle Scholar
  64. [64] Zhang Wuyang, He Zhezhi, Liu Luyang, Jia Zhenhua, Liu Yunxin, Gruteser Marco, Raychaudhuri Dipankar, and Zhang Yanyong. 2021. Elf: Accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom’21). Association for Computing Machinery, New York, NY, 201214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 11 (2018), 2348–2359.Google ScholarGoogle Scholar
  66. [66] Zhou Aojun et al. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar

Index Terms

  1. DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 21, Issue 6
        November 2022
        498 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3561948
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 October 2022
        • Online AM: 26 January 2022
        • Accepted: 7 January 2022
        • Revised: 15 December 2021
        • Received: 16 April 2021
        Published in tecs Volume 21, Issue 6

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!