Abstract
Recently, there has been an explosive growth of mobile and embedded applications using convolutional neural networks (CNNs). To alleviate their excessive computational demands, developers have traditionally resorted to cloud offloading, inducing high infrastructure costs and a strong dependence on networking conditions. On the other end, the emergence of powerful SoCs is gradually enabling on-device execution. Nonetheless, low- and mid-tier platforms still struggle to run state-of-the-art CNNs sufficiently. In this article, we present DynO, a distributed inference framework that combines the best of both worlds to address several challenges, such as device heterogeneity, varying bandwidth, and multi-objective requirements. Key components that enable this are its novel CNN-specific data packing method, which exploits the variability of precision needs in different parts of the CNN when onloading computation, and its novel scheduler, which jointly tunes the partition point and transferred data precision at runtime to adapt inference to its execution environment. Quantitative evaluation shows that DynO outperforms the current state of the art, improving throughput by over an order of magnitude over device-only execution and up to 7.9× over competing CNN offloading systems, with up to 60× less data transferred.
- [1] . 2019. EmBench: Quantifying performance variations of deep neural networks across modern commodity devices. In 3rd International Workshop on Deep Learning for Mobile Systems and Applications (EMDL’19). ACM, 1–6.Google Scholar
- [2] . 2021. Smart at what cost? Characterising mobile deep neural networks in the wild. In Proceedings of the 21st ACM Internet Measurement Conference (IMC’21). Association for Computing Machinery, New York, NY, 658–672. Google Scholar
Digital Library
- [3] . 2019. Resource characterisation of personal-scale sensing models on edge accelerators. In Proceedings of the 1st International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT’19). ACM, 49–55.Google Scholar
Digital Library
- [4] . 2020. What is the state of neural network pruning? In Proceedings of Machine Learning and Systems (MLSys’20). Vol. 2. 129–146.Google Scholar
- [5] . 2019. RTX ON – The Nvidia Turing GPU. In 2019 IEEE Hot Chips 31 Symposium (HCS’19). 1–27. Google Scholar
Cross Ref
- [6] . 2019. A reality check on inference at mobile networks edge. In Proceedings of the 2nd International Workshop on Edge Systems, Analytics and Networking (EdgeSys’19). 54–59.Google Scholar
Digital Library
- [7] . 2011. CloneCloud: Elastic execution between mobile device and cloud. In Proceedings of the 6th Conference on Computer Systems (EuroSys’11). 301–314.Google Scholar
Digital Library
- [8] . 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8–20.Google Scholar
Cross Ref
- [9] . 2010. MAUI: Making smartphones last longer with code offload. In International Conference on Mobile Systems, Applications, and Services (MobiSys’10).Google Scholar
Digital Library
- [10] . 2017. Gate-Variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS’17). IEEE, 1597–1600.Google Scholar
- [11] . 2019. Multi-tenant mobile offloading systems for real-time computer vision applications. In Proceedings of the 20th International Conference on Distributed Computing and Networking (ICDCN’19). 21–30.Google Scholar
Digital Library
- [12] . 2019. Accelerating convolutional neural networks via activation map compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- [13] . 2012. COMET: Code Offload by Migrating Execution Transparently. In USENIX Conference on Operating Systems Design and Implementation (OSDI’12).Google Scholar
- [14] . 2018. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 1 (2018), 35–47.Google Scholar
Cross Ref
- [15] . 2018. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 29, 11 (2018), 5784–5789.Google Scholar
Cross Ref
- [16] . 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations (ICLR’16).Google Scholar
- [17] . 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’16).Google Scholar
Digital Library
- [18] . 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). 620–629.Google Scholar
- [19] . 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision (ICCV’15). 1026–1034.Google Scholar
Digital Library
- [20] . 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google Scholar
Cross Ref
- [21] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- [22] Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’19). 1423–1431.Google Scholar
- [23] Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. 2020. CLIO: Enabling automatic compilation of deep learning pipelines across IoT and Cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom’20).Google Scholar
- [24] Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI benchmark: All about deep learning on smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).Google Scholar
- [25] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- [26] . 2020. Design and optimization of energy-accuracy tradeoff networks for mobile platforms via pretrained deep models. ACM Transactions on Embedded Computing Systems (TECS) 19, 1, Article
4 (2020).Google ScholarDigital Library
- [27] . 2018. Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 11 (2018), 2881–2893.Google Scholar
Cross Ref
- [28] . 2020. SETGAN: Scale and energy trade-off GANs for image applications on mobile platforms. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD’20). 1–9.Google Scholar
Digital Library
- [29] . 2018. IONN: Incremental offloading of neural network computations from mobile devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, 401–411.Google Scholar
Digital Library
- [30] . 2017. In-datacenter performance analysis of a tensor processing unit. In International Symposium on Computer Architecture (ISCA’17).Google Scholar
- [31] . 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17), 615–629.Google Scholar
- [32] . 2018. Learning to fly by myself: A self-supervised CNN-based approach for autonomous navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18). 1–9.Google Scholar
Digital Library
- [33] . 2018. CascadeCNN: Pushing the performance limits of quantisation in convolutional neural networks. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL’18). 155–1557.Google Scholar
- [34] . 2021. Multi-exit semantic segmentation networks. In arXiv.Google Scholar
- [35] . 2018. Advanced driver-assistance systems: A path toward autonomous vehicles. IEEE Consumer Electronics Magazine 7, 5 (2018), 18–25.Google Scholar
- [36] . 2021. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning (EMDL’21). 1–6.Google Scholar
Digital Library
- [37] . 2020. SPINN: Synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom’20).Google Scholar
Digital Library
- [38] . 2020. HAPI: Hardware-aware progressive inference. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD’20). 1–9.Google Scholar
Digital Library
- [39] . 2019. Occlumency: Privacy-preserving remote deep-learning inference using SGX. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom’19).Google Scholar
Digital Library
- [40] . 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications (TWC) 19, 1 (2019), 447–457.Google Scholar
Cross Ref
- [41] . 2019. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In International Conference on Parallel and Distributed Systems (ICPADS’19). 671–678.Google Scholar
- [42] . 2017. MoDNN: Local distributed mobile computing system for Deep Neural Network. In Design, Automation and Test in Europe (DATE’17).Google Scholar
- [43] . 2017. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs. In International Conference on Computer-Aided Design (ICCAD’17). 751–756.Google Scholar
Digital Library
- [44] . 2015. A compression scheme for radio data in high performance computing. Astronomy and Computing 12 (2015), 181–190.Google Scholar
Cross Ref
- [45] . 2020. DarkneTZ: Towards model privacy at the edge using trusted execution environments. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (MobiSys’20). 161–174.Google Scholar
Digital Library
- [46] . 2010. Rectified linear units improve restricted Boltzmann machines. In International Conference on Machine Learning (ICML’10). 807–814.Google Scholar
- [47] . 2018. Characterizing sources of ineffectual computations in deep learning networks. In 2018 IEEE International Symposium on Workload Characterization (IISWC’18). 86–87.Google Scholar
- [48] . 2014. 3G and 4G Network Speeds. https://www.ofcom.org.uk/about-ofcom/latest/media/media-releases/2014/3g-4g-bb-speeds.Google Scholar
- [49] . 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE’16). IEEE, 475–480.Google Scholar
Digital Library
- [50] . 2021. Dynamic Quantization. Retrieved 2022/10/14 12:38:53, from https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html.Google Scholar
- [51] . 2018. Searching for activation functions. In International Conference on Learning Representations (ICLR) Workshops.Google Scholar
- [52] . 2018. Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. In International Symposium on High Performance Computer Architecture (HPCA’18). 78–91.Google Scholar
- [53] . 2011. Cloud computing privacy concerns on our doorstep. Communications of the ACM 54, 1 (2011), 36–38.Google Scholar
Digital Library
- [54] . 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4510–4520.Google Scholar
Cross Ref
- [55] . 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR’15).Google Scholar
- [56] . 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.Google Scholar
- [57] . 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- [58] . 2010. Cloud economics: Principles, costs, and benefits. In Cloud Computing. Springer, 343–360.Google Scholar
Cross Ref
- [59] . 2019. Machine learning at Facebook: Understanding inference at the edge. In IEEE International Symposium on High Performance Computer Architecture (HPCA’19).Google Scholar
- [60] Xiaowei Xu, Yukun Ding, Sharon Xiaobo Hu, Michael Niemier, Jason Cong, Yu Hu, and Yiyu Shi. 2018. Scaling for edge inference of deep neural networks. Nature Electronics 1 (2018), 216–222.Google Scholar
- [61] . 2017. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). 351–360.Google Scholar
Digital Library
- [62] . 2012. To offload or not to offload: An efficient code partition algorithm for mobile cloud computing. In 2012 IEEE 1st International Conference on Cloud Networking (CLOUDNET’12). 80–86.Google Scholar
Cross Ref
- [63] . 2018. DeepCPU: Serving RNN-based deep learning models 10x faster. In 2018 USENIX Annual Technical Conference. 951–965.Google Scholar
- [64] . 2021. Elf: Accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom’21). Association for Computing Machinery, New York, NY, 201–214. Google Scholar
Digital Library
- [65] Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 11 (2018), 2348–2359.Google Scholar
- [66] . 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In International Conference on Learning Representations (ICLR’17).Google Scholar
Index Terms
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
Recommendations
Deep neural networks in the cloud: Review, applications, challenges and research directions
AbstractDeep neural networks (DNNs) are currently being deployed as machine learning technology in a wide range of important real-world applications. DNNs consist of a huge number of parameters that require millions of floating-point ...
Edge-preserving image denoising using a deep convolutional neural network
Highlights- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
AbstractThis paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...






Comments