skip to main content
research-article

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Authors Info & Claims
Published:23 February 2023Publication History
Skip Abstract Section

Abstract

The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.

REFERENCES

  1. [1] Deng Lei, Li Guoqi, Han Song, Shi Luping, and Xie Yuan. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485532. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Deng Shuiguang, Zhao Hailiang, Fang Weijia, Yin Jianwei, Dustdar Schahram, and Zomaya Albert Y.. 2020. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet of Things Journal 7, 8 (2020), 74577469. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Dustdar Schahram, Pujol Victor Casamajor, and Donta Praveen Kumar. 2022. On distributed computing continuum systems. IEEE Transactions on Knowledge and Data Engineering (2022), 11. Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Fan Zhenyu, Yang Wang, Wu Fan, Cao Jing, and Shi Weisong. 2021. Serving at the edge: An edge computing service architecture based on ICN. ACM Trans. Internet Technol. 22, 1, Article 22 (oct2021), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Fang Weiwei, Xue Feng, Ding Yi, Xiong Naixue, and Leung Victor C. M.. 2021. EdgeKE: An on-demand deep learning IoT system for cognitive big data on industrial edge devices. IEEE Transactions on Industrial Informatics 17, 9 (2021), 61446152. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Galkin Pavlo, Golovkina Lydmila, and Klyuchnyk Igor. 2018. Analysis of single-board computers for IoT and IIoT solutions in embedded control systems. In 2018 International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S&T). IEEE, 297302.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Guo Meng-Hao, Xu Tian-Xing, Liu Jiang-Jiang, Liu Zheng-Ning, Jiang Peng-Tao, Mu Tai-Jiang, Zhang Song-Hai, Martin Ralph R., Cheng Ming-Ming, and Hu Shi-Min. 2021. Attention mechanisms in computer vision: A survey. arXiv preprint arXiv:2111.07624 (2021).Google ScholarGoogle Scholar
  8. [8] Han Song, Mao Huizi, and Dally William J.. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  9. [9] Hazra Abhishek, Donta Praveen Kumar, Amgoth Tarachand, and Dustdar Schahram. 2022. Cooperative transmission scheduling and computation offloading with collaboration of fog and cloud for industrial IoT applications. IEEE Internet of Things Journal (2022), 11. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] He Yu, Song Kechen, Meng Qinggang, and Yan Yunhui. 2020. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Transactions on Instrumentation and Measurement 69, 4 (2020), 14931504. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  13. [13] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Huang Shaobo, Liu Anfeng, Zhang Shaobo, Wang Tian, and Xiong Neal N.. 2021. BD-VTE: A novel baseline data based verifiable trust evaluation scheme for smart network systems. IEEE Transactions on Network Science and Engineering 8, 3 (2021), 20872105. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hussain Tanveer, Muhammad Khan, Ser Javier Del, Baik Sung Wook, and Albuquerque Victor Hugo C. de. 2020. Intelligent embedded vision for summarization of multiview videos in IIoT. IEEE Transactions on Industrial Informatics 16, 4 (2020), 25922602. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Iandola Forrest N., Han Song, Moskewicz Matthew W., Ashraf Khalid, Dally William J., and Keutzer Kurt. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google ScholarGoogle Scholar
  17. [17] Jaderberg Max, Simonyan Karen, Zisserman Andrew, and Kavukcuoglu Koray. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems, Vol. 28.Google ScholarGoogle Scholar
  18. [18] Kang Yiping, Hauswald Johann, Gao Cao, Rovinski Austin, Mudge Trevor, Mars Jason, and Tang Lingjia. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, 615629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Kurkovsky Stan and Williams Chad. 2017. Raspberry Pi as a platform for the Internet of things projects: Experiences and lessons. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. 6469.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Li En, Zhou Zhi, and Chen Xu. 2018. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy. In Proceedings of the 2018 Workshop on Mobile Edge Communications. 3136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Li H., Hu C., Jiang J., Wang Z., Wen Y., and Zhu W.. 2018. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE Computer Society, Los Alamitos, CA, 671678. Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Li Hao, Kadav Asim, Durdanovic Igor, Samet Hanan, and Graf Hans Peter. 2017. Pruning filters for efficient ConvNets. In Conference Track Proceedings on the 5th International Conference on Learning Representations (ICLR 2017).Google ScholarGoogle Scholar
  23. [23] Li Liangzhi, Ota Kaoru, and Dong Mianxiong. 2018. Deep learning for smart industry: Efficient manufacture inspection system with fog computing. IEEE Transactions on Industrial Informatics 14, 10 (2018), 46654673. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Luo Jian-Hao, Wu Jianxin, and Lin Weiyao. 2017. ThiNet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Ma Xiaolong, Guo Fu-Ming, Niu Wei, Lin Xue, Tang Jian, Ma Kaisheng, Ren Bin, and Wang Yanzhi. 2020. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence 34, 4 (Apr.2020), 51175124. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Mao J., Chen X., Nixon K. W., Krieger C., and Chen Y.. 2017. MoDNN: Local distributed mobile computing system for deep neural network. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE), 2017. 13961401.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mao J., Yang Z., Wen W., Wu C., Song L., Nixon K. W., Chen X., Li H., and Chen Y.. 2017. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 751756.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Molchanov Pavlo, Tyree Stephen, Karras Tero, Aila Timo, and Kautz Jan. 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016).Google ScholarGoogle Scholar
  29. [29] Pei Songwen, Wu Yusheng, Guo Jin, and Qiu Meikang. 2022. Neural network pruning by recurrent weights for finance market. ACM Trans. Internet Technol. 22, 3, Article 56 (jan2022), 23 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Pop Paul, Raagaard Michael Lander, Gutierrez Marina, and Steiner Wilfried. 2018. Enabling fog computing for industrial automation through time-sensitive networking (TSN). IEEE Communications Standards Magazine 2, 2 (2018), 5561.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Pouyanfar Samira, Sadiq Saad, Yan Yilin, Tian Haiman, Tao Yudong, Reyes Maria Presa, Shyu Mei-Ling, Chen Shu-Ching, and Iyengar S. S.. 2018. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. 51, 5, Article 92 (Sept.2018), 36 pages. Google ScholarGoogle Scholar
  32. [32] Prasad R., Dovrolis C., Murray M., and Claffy K.. 2003. Bandwidth estimation: Metrics, measurement techniques, and tools. IEEE Network 17, 6 (2003), 2735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Qin Zequn, Zhang Pengyi, Wu Fei, and Li Xi. 2021. FcaNet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 783792.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Reuther Albert, Michaleas Peter, Jones Michael, Gadepally Vijay, Samsi Siddharth, and Kepner Jeremy. 2019. Survey and benchmarking of machine learning accelerators. In Prpceedings of the 2019 IEEE High Performance Extreme Computing Conference (HPEC). 19. Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Sze V., Chen Y., Yang T., and Emer J. S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (Dec2017), 22952329. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Teerapittayanon S., McDanel B., and Kung H. T.. 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR). 24642469. Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Wang Qilong, Wu Banggu, Zhu Pengfei, Li Peihua, Zuo Wangmeng, and Hu Qinghua. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1153111539. Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wang Yue, Shen Jianghao, Hu Ting-Kuei, Xu Pengfei, Nguyen Tan, Baraniuk Richard, Wang Zhangyang, and Lin Yingyan. 2020. Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference. IEEE Journal of Selected Topics in Signal Processing 14, 4 (2020), 623633. Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Wen Wei, Wu Chunpeng, Wang Yandan, Chen Yiran, and Li Hai. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. 20742082.Google ScholarGoogle Scholar
  40. [40] Xu Yifan, Wei Huapeng, Lin Minxuan, Deng Yingying, Sheng Kekai, Zhang Mengdan, Tang Fan, Dong Weiming, Huang Feiyue, and Xu Changsheng. 2022. Transformers in computational visual media: A survey. Computational Visual Media 8, 1 (2022), 3362.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Xue Feng, Fang Weiwei, Xu Wenyuan, Wang Qi, Ma Xiaodong, and Ding Yi. 2020. EdgeLD: Locally distributed deep learning inference on edge device clusters. In Proceedings of the 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 613619.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhao Z., Barijough K. M., and Gerstlauer A.. 2018. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 23482359.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 23, Issue 1
          February 2023
          564 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/3584863
          • Editor:
          • Ling Liu
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 February 2023
          • Online AM: 28 July 2022
          • Accepted: 21 July 2022
          • Revised: 27 June 2022
          • Received: 8 April 2022
          Published in toit Volume 23, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)272
          • Downloads (Last 6 weeks)29

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!