skip to main content
research-article

CAP’NN: A Class-aware Framework for Personalized Neural Network Inference

Authors Info & Claims
Published:09 December 2022Publication History
Skip Abstract Section

Abstract

We propose a framework for Class-aware Personalized Neural Network Inference (CAP’NN), which prunes an already-trained neural network model based on the preferences of individual users. Specifically, by adapting to the subset of output classes that each user is expected to encounter, CAP’NN is able to prune not only ineffectual neurons but also miseffectual neurons that confuse classification, without the need to retrain the network. CAP’NN also exploits the similarities among pruning requests from different users to minimize the timing overheads of pruning the network. To achieve this, we propose a clustering algorithm that groups similar classes in the network based on the firing rates of neurons for each class and then implement a lightweight cache architecture to store and reuse information from previously pruned networks. In our experiments with VGG-16, AlexNet, and ResNet-152 networks, CAP’NN achieves, on average, up to 47% model size reduction while actually improving the top-1(5) classification accuracy by up to 3.9%(3.4%) when the user only encounters a subset of the trained classes in these networks.

REFERENCES

  1. [1] Atta Islam, Tozun Pinar, Ailamaki Anastasia, and Moshovos Andreas. 2012. SLICC: Self-Assembly of instruction cache collectives for OLTP workloads. In Proceedings of the International Symposium on Microarchitecture. 188198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bholowalia P. and Kumar A.. 2014. A clustering technique based on Elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 9 (2014), 17–24.Google ScholarGoogle Scholar
  3. [3] Chen Yu-Hsin, Yang Tien-Ju, Emer Joel, and Sze Vivienne. 2018. Understanding the limitations of existing energy-efficient design approaches for deep neural networks. Energy 2, L1 (2018), L3.Google ScholarGoogle Scholar
  4. [4] Chio J., Hakimi Z., Shin P., Sampson W., and Narayanan V.. 2019. Context-aware convolutional neural network over distributed system in collaborative computing. In Proceedings of the Design Automation Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cun Y. Le, Denker J. S., and Solla S. A.. 1990. Optimal brain damage. In Proceedings of the Conference on Neural Information Processing Systems. 598605.Google ScholarGoogle Scholar
  6. [6] Denton E. L., Zaremba W., Bruna J., LeCun Y., and Fergus R.. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the Conference on Neural Information Processing Systems. 12691277.Google ScholarGoogle Scholar
  7. [7] Dong X., Xu C., Xie Y., and Jouppi N. P.. 2012. NVSIM: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 31, 7 (2012), 9941007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Guo J. and Potkonjak M.. 2017. Pruning ConvNets online for efficient specialist model pruning. In Proceedings of the Computer Vision and Pattern Recognition Conference. 113120.Google ScholarGoogle Scholar
  9. [9] Han S., Pool J., Tran J., and Dally W.. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Neural Information Processing Systems. 11351143.Google ScholarGoogle Scholar
  10. [10] Hassibi Babak, Stork David G., and Wolff Gregory J.. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the International Conference on Neural Networks. 293299.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] He Y., Zhang X., and Sun J.. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision. 13891397.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Hemmat Maedeh, Miguel Joshua San, and Davoodi Azadeh. 2020. CAP’NN: Class-aware personalized neural network inference. In Proceedings of the Design Automation Conference. 16.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] [n.d.] LeaderGPU.. Retrieved from https://www.leadergpu.com/catalog/tensorflow.Google ScholarGoogle Scholar
  14. [14] Hu Hengyuan, Peng Rui, Tai Y. W., and Tang Ch.. 2016. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. Retrieved from https://arXiv:1607.03250.Google ScholarGoogle Scholar
  15. [15] Hu Y. H., Xue Qiuzhen, and Tompkins W. J.. 1991. Structural simplification of a feed-forward, multi-layer perceptron artificial neural network. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 10611064.Google ScholarGoogle Scholar
  16. [16] Jouppi N. P. et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture. 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Kim Y. D., Park Eunhyeok, Yoo Sungjoo, Choi Taelim, Yang Lu, and Shin Dongjun. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. Retrieved from https://arXiv:1511.06530.Google ScholarGoogle Scholar
  18. [18] Kung S. Y. and Hu Y. H.. 1991. A Frobenius approximation reduction method (FARM) for determining optimal number of hidden units. In Proceedings of the IEEE International Joint Conference on Neural Networks. 163168.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Li Hao, Kadav Asim, Durdanovic Igor, Samet Hanan, and Graf Hans Peter. 2016. Pruning filters for efficient convnets. Retrieved from https://arXiv:1608.08710.Google ScholarGoogle Scholar
  20. [20] Luo J., Wu J., and Lin W.. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the International Conference on Computer Vision. 50585066.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Molchanov Pavlo, Tyree Stephen, Karras Tero, Aila Timo, and Kautz Jan. 2017. Pruning convolutional neural networks for resource efficient inference. Retrieved from https://arXiv:1611.06440.Google ScholarGoogle Scholar
  22. [22] Nazemi M., Pasandi G., and Pedram M.. 2019. Energy-efficient, low-latency realization of neural networks through boolean logic minimization. In Proceedings of the Asia and South Pacific Design Automation Conference. 274279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Qin Z., Yu F., Liu Ch., and Chen X.. 2019. CAPTOR: A class adaptive filter pruning framework for convolutional neural networks in Mobile applications. In Proceedings of the Asia and South Pacific Design Automation Conference. 444449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Razlighi M. S., Imani M., Koushanfar F., and Rosing T.. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design Automation and Test in Europe. 17791784.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Shafiee A., Nag A., Muralimanohar N., Balasubramonian R., Strachan J., Hu M., Williams R., and Srikumar V.. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the International Symposium on Computer Architecture. 1426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Snigdha F. S., Ahmed I., Manasi S. D., Mankalale M. G., Hu J., and Sapatnekar S. S.. 2019. SeFAct: Selective feature activation and early classification for CNNs. In Proceedings of the Asia and South Pacific Design Automation Conference. 487492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Wen Wei, Wu Chunpeng, Wang Yandan, Chen Yiran, and Li Hai. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Neural Information Processing Systems. 20742082.Google ScholarGoogle Scholar
  28. [28] Yu Zhengxin, Hu Jia, Min Geyong, Lu Haochuan, Zhao Zhiwei, Wang Haozhe, and Georgalas Nektarios. 2018. Federated learning based proactive content caching in edge computing. In Proceedings of the Global Communications Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Zhang B., Davoodi A., and Hu Y.. 2018. Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks. IEEE J. Emerg. Select. Top. Circ. Syst. 8, 4 (2018), 836848.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Zhang C., Li P., Sun G., Guan Y., Xiao B., and Cong J.. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 161170.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CAP’NN: A Class-aware Framework for Personalized Neural Network Inference

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
      September 2022
      526 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3561947
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 December 2022
      • Online AM: 21 March 2022
      • Accepted: 19 February 2022
      • Revised: 11 January 2022
      • Received: 29 June 2021
      Published in tecs Volume 21, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)74
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!