Abstract
The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.
- [1] . 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485–532. Google Scholar
Cross Ref
- [2] . 2020. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet of Things Journal 7, 8 (2020), 7457–7469. Google Scholar
Cross Ref
- [3] . 2022. On distributed computing continuum systems. IEEE Transactions on Knowledge and Data Engineering (2022), 1–1. Google Scholar
Cross Ref
- [4] . 2021. Serving at the edge: An edge computing service architecture based on ICN. ACM Trans. Internet Technol. 22, 1, Article
22 (oct 2021), 27 pages. Google ScholarDigital Library
- [5] . 2021. EdgeKE: An on-demand deep learning IoT system for cognitive big data on industrial edge devices. IEEE Transactions on Industrial Informatics 17, 9 (2021), 6144–6152. Google Scholar
Cross Ref
- [6] . 2018. Analysis of single-board computers for IoT and IIoT solutions in embedded control systems. In 2018 International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S&T). IEEE, 297–302.Google Scholar
Cross Ref
- [7] . 2021. Attention mechanisms in computer vision: A survey. arXiv preprint arXiv:2111.07624 (2021).Google Scholar
- [8] . 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
- [9] . 2022. Cooperative transmission scheduling and computation offloading with collaboration of fog and cloud for industrial IoT applications. IEEE Internet of Things Journal (2022), 1–1. Google Scholar
Cross Ref
- [10] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [11] . 2020. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Transactions on Instrumentation and Measurement 69, 4 (2020), 1493–1504. Google Scholar
Cross Ref
- [12] . 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- [13] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [14] . 2021. BD-VTE: A novel baseline data based verifiable trust evaluation scheme for smart network systems. IEEE Transactions on Network Science and Engineering 8, 3 (2021), 2087–2105. Google Scholar
Cross Ref
- [15] . 2020. Intelligent embedded vision for summarization of multiview videos in IIoT. IEEE Transactions on Industrial Informatics 16, 4 (2020), 2592–2602. Google Scholar
Cross Ref
- [16] . 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google Scholar
- [17] . 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems, Vol. 28.Google Scholar
- [18] . 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, 615–629. Google Scholar
Digital Library
- [19] . 2017. Raspberry Pi as a platform for the Internet of things projects: Experiences and lessons. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. 64–69.Google Scholar
Digital Library
- [20] . 2018. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy. In Proceedings of the 2018 Workshop on Mobile Edge Communications. 31–36.Google Scholar
Digital Library
- [21] . 2018. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE Computer Society, Los Alamitos, CA, 671–678. Google Scholar
Cross Ref
- [22] . 2017. Pruning filters for efficient ConvNets. In Conference Track Proceedings on the 5th International Conference on Learning Representations (ICLR 2017).Google Scholar
- [23] . 2018. Deep learning for smart industry: Efficient manufacture inspection system with fog computing. IEEE Transactions on Industrial Informatics 14, 10 (2018), 4665–4673. Google Scholar
Cross Ref
- [24] . 2017. ThiNet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).Google Scholar
Cross Ref
- [25] . 2020. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence 34, 4 (
Apr. 2020), 5117–5124. Google ScholarCross Ref
- [26] . 2017. MoDNN: Local distributed mobile computing system for deep neural network. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE), 2017. 1396–1401.Google Scholar
Cross Ref
- [27] . 2017. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 751–756.Google Scholar
Digital Library
- [28] . 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016).Google Scholar
- [29] . 2022. Neural network pruning by recurrent weights for finance market. ACM Trans. Internet Technol. 22, 3, Article
56 (jan 2022), 23 pages. Google ScholarDigital Library
- [30] . 2018. Enabling fog computing for industrial automation through time-sensitive networking (TSN). IEEE Communications Standards Magazine 2, 2 (2018), 55–61.Google Scholar
Cross Ref
- [31] . 2018. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. 51, 5, Article
92 (Sept. 2018), 36 pages. Google Scholar - [32] . 2003. Bandwidth estimation: Metrics, measurement techniques, and tools. IEEE Network 17, 6 (2003), 27–35. Google Scholar
Digital Library
- [33] . 2021. FcaNet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 783–792.Google Scholar
Cross Ref
- [34] . 2019. Survey and benchmarking of machine learning accelerators. In Prpceedings of the 2019 IEEE High Performance Extreme Computing Conference (HPEC). 1–9. Google Scholar
Cross Ref
- [35] . 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (
Dec 2017), 2295–2329. Google ScholarCross Ref
- [36] . 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR). 2464–2469. Google Scholar
Cross Ref
- [37] . 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11531–11539. Google Scholar
Cross Ref
- [38] . 2020. Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference. IEEE Journal of Selected Topics in Signal Processing 14, 4 (2020), 623–633. Google Scholar
Cross Ref
- [39] . 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. 2074–2082.Google Scholar
- [40] . 2022. Transformers in computational visual media: A survey. Computational Visual Media 8, 1 (2022), 33–62.Google Scholar
Cross Ref
- [41] . 2020. EdgeLD: Locally distributed deep learning inference on edge device clusters. In Proceedings of the 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 613–619.Google Scholar
Cross Ref
- [42] . 2018. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2348–2359.Google Scholar
Cross Ref
Index Terms
Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters
Recommendations
Distributing deep learning inference on edge devices
CoNEXT '20: Proceedings of the 16th International Conference on emerging Networking EXperiments and TechnologiesDeep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are widely used in IoT related applications. However, inferencing pre-trained large DNNs and CNNs consumes a significant amount of time, memory and computational resources. This makes ...
DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Wireless Algorithms, Systems, and ApplicationsAbstractRecently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the ...
Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning
Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (...






Comments