skip to main content
research-article

Hardware-friendly User-specific Machine Learning for Edge Devices

Published:08 October 2022Publication History
Skip Abstract Section

Abstract

Machine learning (ML) on resource-constrained edge devices is expensive and often requires offloading computation to the cloud, which may compromise the privacy of user data. In contrast, the type of data processed at edge devices is user-specific and limited to a few inference classes. In this work, we explore building smaller, user-specific machine learning models, rather than utilizing a generic, compute-intensive machine learning model that caters to a diverse range of users. We first present a hardware-friendly, lightweight pruning technique to create user-specific models directly on mobile platforms, while simultaneously executing inferences. The proposed technique leverages compute sharing between pruning and inference, customizes the backward pass of training, and chooses a pruning granularity for efficient processing on edge. We then propose architectural support to prune user-specific models on a systolic edge ML inference accelerator. We demonstrate that user-specific models provide a speedup of 2.9× and 2.3× on the mobile CPUs for the ResNet-50 and Inception-V3 models.

REFERENCES

  1. [1] [n.d.]. Edge Tpu. https://cloud.google.com/edge-tpu.Google ScholarGoogle Scholar
  2. [2] [n.d.]. Edge TPU Performance Benchmarks. https://coral.ai/docs/edgetpu/benchmarks/.Google ScholarGoogle Scholar
  3. [3] [n.d.]. Intel Image Classification: Image Scene Classification of Multiclass. https://www.kaggle.com/puneet6060/intel-image-classification/version/2.Google ScholarGoogle Scholar
  4. [4] [n.d.]. iPhone 12 Pro Specifications. https://www.apple.com/iphone-12-pro/.Google ScholarGoogle Scholar
  5. [5] [n.d.]. What is the NPU in Galaxy and What Does It Do? https://www.samsung.com/global/galaxy/what-is/npu/.Google ScholarGoogle Scholar
  6. [6] [n.d.]. XNNPACK. https://github.com/google/XNNPACK.Google ScholarGoogle Scholar
  7. [7] Apostolopoulos Pavlos Athanasios, Tsiropoulou Eirini Eleni, and Papavassiliou Symeon. 2020. Risk-aware data offloading in multi-server multi-access edge computing environment. IEEE/ACM Transactions on Networking 28, 3 (2020), 14051418.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Ashok Anubhav, Rhinehart Nicholas, Beainy Fares, and Kitani Kris M.. 2017. N2N learning: Network to network compression via policy gradient reinforcement learning. arXiv preprint arXiv:1709.06030 (2017).Google ScholarGoogle Scholar
  9. [9] Bonawitz Keith, Eichner Hubert, Grieskamp Wolfgang, Huba Dzmitry, Ingerman Alex, Ivanov Vladimir, Kiddon Chloe, Konečny Jakub, Mazzocchi Stefano, McMahan H. Brendan, Overveldt Timon Van, Petrou David, Ramage Daniel, and Roselander Jason. 2019. Towards federated learning at scale: ‘System design. arXiv preprint arXiv:1902.01046 (2019).Google ScholarGoogle Scholar
  10. [10] Buckler Mark, Bedoukian Philip, Jayasuriya Suren, and Sampson Adrian. 2018. \( EVA^{2} \): Exploiting temporal redundancy in live computer vision. arXiv preprint arXiv:1803.06312 (2018).Google ScholarGoogle Scholar
  11. [11] Cai Han, Yang Jiacheng, Zhang Weinan, Han Song, and Yu Yong. 2018. Path-level network transformation for efficient architecture search. In International Conference on Machine Learning. PMLR, 678687.Google ScholarGoogle Scholar
  12. [12] Chen Tianqi, Goodfellow Ian, and Shlens Jonathon. 2015. Net2Net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641 (2015).Google ScholarGoogle Scholar
  13. [13] Chen Yu-Hsin, Emer Joel, and Sze Vivienne. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Fowers Jeremy, Ovtcharov Kalin, Papamichael Michael, Massengill Todd, Liu Ming, Lo Daniel, Alkalay Shlomi, Haselman Michael, Adams Logan, Ghandi Mahdi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 114.Google ScholarGoogle Scholar
  16. [16] Gong Zhangxiaowen, Ji Houxiang, Fletcher Christopher W., Hughes Christopher J., Baghsorkhi Sara, and Torrellas Josep. 2020. SAVE: Sparsity-aware vector engine for accelerating DNN training and inference on CPUs. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 796810.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Gong Zhangxiaowen, Ji Houxiang, Fletcher Christopher W., Hughes Christopher J., and Torrellas Josep. 2020. SparseTrain: Leveraging dynamic sparsity in software for training DNNs on general-purpose SIMD processors. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 279292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Guo Shaopeng, Wang Yujie, Li Quanquan, and Yan Junjie. 2020. DMCP: Differentiable Markov Channel Pruning for neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15391547.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Hadidi Ramyad et al. 2019. Characterizing the deployment of deep neural networks on commercial edge devices. In Proc IISWC.Google ScholarGoogle Scholar
  20. [20] Han Song, Kang Junlong, Mao Huizi, Hu Yiming, Li Xin, Li Yubin, Xie Dongliang, Luo Hong, Yao Song, Wang Yu, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 7584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Han Song, Pool Jeff, Tran John, and Dally William J.. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1. 11351143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] He Hengtao, Wen Chao-Kai, Jin Shi, and Li Geoffrey Ye. 2020. Model-driven deep learning for MIMO detection. IEEE Transactions on Signal Processing 68 (2020), 17021715.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Identity Mappings in Deep Residual Networks. arxiv:1603.05027 [cs.CV]Google ScholarGoogle Scholar
  24. [24] He Yihui, Lin Ji, Liu Zhijian, Wang Hanrui, Li Li-Jia, and Han Song. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). 784800.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] He Yihui, Zhang Xiangyu, and Sun Jian. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 13891397.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Hinton Geoffrey, Vinyals Oriol, and Dean Jeff. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  27. [27] Holmes Connor, Mawhirter Daniel, He Yuxiong, Yan Feng, and Wu Bo. 2019. GRNN: Low-latency and scalable RNN inference on GPUs. In Proceedings of the Fourteenth EuroSys Conference 2019. 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Huang Chao-Tsung, Ding Yu-Chun, Wang Huan-Ching, Weng Chi-Wen, Lin Kai-Ping, Wang Li-Wei, and Chen Li-De. 2019. ECNN: A block-based and highly-parallel CNN accelerator for edge inference. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 182195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Huang Xin-Lin, Ma Xiaomin, and Hu Fei. 2018. Machine learning and intelligent communications. Mobile Networks and Applications 23, 1 (2018), 6870.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Jain Animesh, Phanishayee Amar, Mars Jason, Tang Lingjia, and Pekhimenko Gennady. 2018. Gist: Efficient data encoding for deep neural network training. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 776789.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 112.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Kang Yiping et al. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGPLAN Notices 52, 4 (2017), 615629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Khani Mehrdad, Alizadeh Mohammad, Hoydis Jakob, and Fleming Phil. 2020. Adaptive neural signal detection for massive MIMO. IEEE Transactions on Wireless Communications 19, 8 (2020), 56355648.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Konečnỳ Jakub, McMahan H. Brendan, Yu Felix X., Richtárik Peter, Suresh Ananda Theertha, and Bacon Dave. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google ScholarGoogle Scholar
  35. [35] Li Hao, Kadav Asim, Durdanovic Igor, Samet Hanan, and Graf Hans Peter. 2016. Pruning filters for efficient ConvNets. arXiv preprint arXiv:1608.08710 (2016).Google ScholarGoogle Scholar
  36. [36] Lin Shih-Chieh, Zhang Yunqi, Hsu Chang-Hong, Skach Matt, Haque Md. E., Tang Lingjia, and Mars Jason. 2018. The architectural implications of autonomous driving: Constraints and acceleration. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. 751766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Luo Changqing, Ji Jinlong, Wang Qianlong, Chen Xuhui, and Li Pan. 2018. Channel state information prediction for 5G wireless communications: A deep learning approach. IEEE Transactions on Network Science and Engineering 7, 1 (2018), 227236.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Luo Jian-Hao, Wu Jianxin, and Lin Weiyao. 2017. ThiNet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 50585066.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Lym Sangkug, Choukse Esha, Zangeneh Siavash, Wen Wei, Sanghavi Sujay, and Erez Mattan. 2019. PruneTrain: Fast neural network training by dynamic sparse model reconfiguration. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Mahmoud Mostafa, Edo Isak, Zadeh Ali Hadi, Awad Omar Mohamed, Pekhimenko Gennady, Albericio Jorge, and Moshovos Andreas. 2020. TensorDash: Exploiting sparsity to accelerate deep neural network training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 781795.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Mattson Peter, Reddi Vijay Janapa, Cheng Christine, Coleman Cody, Diamos Greg, Kanter David, Micikevicius Paulius, Patterson David, Schmuelling Guenther, Tang Hanlin, et al. 2020. MLPerf: An industry standard benchmark suite for machine learning performance. IEEE Micro 40, 2 (2020), 816.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Naumov Maxim, Mudigere Dheevatsa, Shi Hao-Jun Michael, Huang Jianyu, Sundaraman Narayanan, Park Jongsoo, Wang Xiaodong, Gupta Udit, Wu Carole-Jean, Azzolini Alisson G., et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).Google ScholarGoogle Scholar
  43. [43] Niu Wei, Ma Xiaolong, Lin Sheng, Wang Shihao, Qian Xuehai, Lin Xue, Wang Yanzhi, and Ren Bin. 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Parashar Angshuman, Rhu Minsoo, Mukkara Anurag, Puglielli Antonio, Venkatesan Rangharajan, Khailany Brucek, Emer Joel, Keckler Stephen W., and Dally William J.. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 2740.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Riera Marc, Arnau Jose-Maria, and González Antonio. 2018. Computation reuse in DNNs by exploiting input similarity. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 5768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Romero Adriana, Ballas Nicolas, Kahou Samira Ebrahimi, Chassang Antoine, Gatta Carlo, and Bengio Yoshua. 2014. FitNets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).Google ScholarGoogle Scholar
  47. [47] Samajdar Ananda, Zhu Yuhao, Whatmough Paul, Mattina Matthew, and Krishna Tushar. 2018. SCALE-Sim: Systolic CNN accelerator simulator. arXiv preprint arXiv:1811.02883 (2018).Google ScholarGoogle Scholar
  48. [48] Samragh Mohammad, Ghasemzadeh Mohammad, and Koushanfar Farinaz. 2017. Customizing neural networks for efficient FPGA implementation. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 8592.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Samuel Neev, Diskin Tzvi, and Wiesel Ami. 2017. Deep MIMO detection. In 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 15.Google ScholarGoogle Scholar
  50. [50] Simonyan Karen and Zisserman Andrew. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014).Google ScholarGoogle Scholar
  51. [51] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  52. [52] Song Mingcong, Zhong Kan, Zhang Jiaqi, Hu Yang, Liu Duo, Zhang Weigong, Wang Jing, and Li Tao. 2018. In-Situ AI: Towards autonomous and incremental deep learning for IoT systems. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 92103.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Sze Vivienne, Chen Yu-Hsin, Yang Tien-Ju, and Emer Joel S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 22952329.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Szegedy Christian, Vanhoucke Vincent, Ioffe Sergey, Shlens Jon, and Wojna Zbigniew. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28182826.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Wang Siqi, Ananthanarayanan Gayathri, Zeng Yifan, Goel Neeraj, Pathania Anuj, and Mitra Tulika. 2019. High-throughput CNN inference on embedded ARM Big. LITTLE multicore processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 22542267.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Wang Xiaowei, Yu Jiecao, Augustine Charles, Iyer Ravi, and Das Reetuparna. 2019. Bit prudent in-cache acceleration of deep convolutional neural networks. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 8193.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Wang Ziheng. 2020. SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 3142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Whatmough Paul N., Zhou Chuteng, Hansen Patrick, Venkataramanaiah Shreyas Kolala, Seo Jae-sun, and Mattina Matthew. 2019. FixyNN: Efficient hardware for mobile computer vision via transfer learning. arXiv preprint arXiv:1902.11128 (2019).Google ScholarGoogle Scholar
  60. [60] Wu Carole-Jean, Brooks David, Chen Kevin, Chen Douglas, Choudhury Sy, Dukhan Marat, Hazelwood Kim, Isaac Eldad, Jia Yangqing, Jia Bill, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 331344.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Ye Hao, Li Geoffrey Ye, and Juang Biing-Hwang. 2017. Power of deep learning for channel estimation and signal detection in OFDM systems. IEEE Wireless Communications Letters 7, 1 (2017), 114117.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Yosinski Jason, Clune Jeff, Bengio Yoshua, and Lipson Hod. 2014. How transferable are features in deep neural networks?. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 33203328.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Yu Jiecao, Lukefahr Andrew, Palframan David, Dasika Ganesh, Das Reetuparna, and Mahlke Scott. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. ACM SIGARCH Computer Architecture News 45, 2 (2017), 548560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Yu Ruichi, Li Ang, Chen Chun-Fu, Lai Jui-Hsin, Morariu Vlad I., Han Xintong, Gao Mingfei, Lin Ching-Yung, and Davis Larry S.. 2018. NISP: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 91949203.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Zhang Jiaqi, Chen Xiangru, Song Mingcong, and Li Tao. 2019. Eager pruning: Algorithm and architecture support for fast training of deep neural networks. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 292303.Google ScholarGoogle Scholar
  66. [66] Zhu Yuhao, Samajdar Anand, Mattina Matthew, and Whatmough Paul. 2018. Euphrates: Algorithm-SoC Co-design for low-power mobile continuous vision. arXiv preprint arXiv:1803.11232 (2018).Google ScholarGoogle Scholar

Index Terms

  1. Hardware-friendly User-specific Machine Learning for Edge Devices

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
          September 2022
          526 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3561947
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 October 2022
          • Online AM: 31 March 2022
          • Accepted: 4 March 2022
          • Revised: 1 February 2022
          • Received: 15 July 2021
          Published in tecs Volume 21, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)197
          • Downloads (Last 6 weeks)16

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!