skip to main content
research-article

Neural Network Compression for Noisy Storage Devices

Published:13 May 2023Publication History
Skip Abstract Section

Abstract

Compression and efficient storage of neural network (NN) parameters is critical for applications that run on resource-constrained devices. Despite the significant progress in NN model compression, there has been considerably less investigation in the actual physical storage of NN parameters. Conventionally, model compression and physical storage are decoupled, as digital storage media with error-correcting codes (ECCs) provide robust error-free storage. However, this decoupled approach is inefficient as it ignores the overparameterization present in most NNs and forces the memory device to allocate the same amount of resources to every bit of information regardless of its importance. In this work, we investigate analog memory devices as an alternative to digital media – one that naturally provides a way to add more protection for significant bits unlike its counterpart, but is noisy and may compromise the stored model’s performance if used naively. We develop a variety of robust coding strategies for NN weight storage on analog devices, and propose an approach to jointly optimize model compression and memory resource allocation. We then demonstrate the efficacy of our approach on models trained on MNIST, CIFAR-10, and ImageNet datasets for existing compression techniques. Compared to conventional error-free digital storage, our method reduces the memory footprint by up to one order of magnitude, without significantly compromising the stored model’s accuracy.

REFERENCES

  1. [1] Achille Alessandro, Paolini Giovanni, and Soatto Stefano. 2019. Where is the information in a deep neural network? arXiv preprint arXiv:1905.12213 (2019).Google ScholarGoogle Scholar
  2. [2] Banner Ron, Hubara Itay, Hoffer Elad, and Soudry Daniel. 2018. Scalable methods for 8-bit training of neural networks. In Advances in Neural Information Processing Systems. 51455153.Google ScholarGoogle Scholar
  3. [3] Banner Ron, Nahshan Yury, and Soudry Daniel. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  4. [4] Barber David and Agakov Felix V.. 2003. The IM algorithm: A variational approach to information maximization. In Advances in Neural Information Processing Systems. None.Google ScholarGoogle Scholar
  5. [5] Barnes Leighton Pate, Inan Huseyin A., Isik Berivan, and Özgür Ayfer. 2020. rTop-k: A statistical estimation approach to distributed SGD. IEEE Journal on Selected Areas in Information Theory 1, 3 (2020), 897907.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Binas Jonathan, Neil Daniel, Indiveri Giacomo, Liu Shih-Chii, and Pfeiffer Michael. 2016. Precise neural network computation with imprecise analog devices. arXiv preprint arXiv:1606.07786 (2016).Google ScholarGoogle Scholar
  7. [7] Bird Thomas, Ballé Johannes, Singh Saurabh, and Chou Philip A.. 2021. 3D scene compression through entropy penalized neural representation functions. In 2021 Picture Coding Symposium (PCS). IEEE, 15.Google ScholarGoogle Scholar
  8. [8] Bo G. M., Caviglia D. D., and Valle M.. 2000. An on-chip learning neural network. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Vol. 4. 6671. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bucilua Cristian, Caruana Rich, and Niculescu-Mizil Alexandru. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). Association for Computing Machinery, New York, NY, USA, 535541. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Burr G. W., Ambrogio S., Narayanan P., Tsai H., Mackin C., and Chen A.. 2019. Accelerating deep neural networks with analog memory devices. In 2019 China Semiconductor Technology International Conference (CSTIC). 13. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Cheng Yu, Wang Duo, Zhou Pan, and Zhang Tao. 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126136.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Cheng Yu, Yu Felix X., Feris Rogerio S., Kumar Sanjiv, Choudhary Alok, and Chang Shi-Fu. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision. 28572865.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Choi Kristy, Tatwawadi Kedar, Grover Aditya, Weissman Tsachy, and Ermon Stefano. 2019. Neural joint source-channel coding. In International Conference on Machine Learning. PMLR, 11821192.Google ScholarGoogle Scholar
  14. [14] Choi Yoojin, El-Khamy Mostafa, and Lee Jungwon. 2020. Universal deep neural network compression. IEEE Journal of Selected Topics in Signal Processing (2020).Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Cun Yann Le, Denker John S., and Solla Sara A.. 1990. Optimal Brain Damage. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 598605.Google ScholarGoogle Scholar
  16. [16] Dean Jeffrey, Corrado Greg, Monga Rajat, Chen Kai, Devin Matthieu, Mao Mark, Ranzato Marc’Aurelio, Senior Andrew, Tucker Paul, Yang Ke, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 12231231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Deng J., Dong W., Socher R., Li L.-J., Li K., and Fei-Fei L.. 2009. ImageNet: A large-scale hierarchical image database. In CVPR09.Google ScholarGoogle Scholar
  18. [18] Deng Lei, Li Guoqi, Han Song, Shi Luping, and Xie Yuan. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485532.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Du Yuan, Du Li, Gu Xuefeng, Du Jieqiong, Wang X. Shawn, Hu Boyu, Jiang Mingzhe, Chen Xiaoliang, Iyer Subramanian S., and Chang Mau-Chung Frank. 2018. An analog neural network computing engine using CMOS-compatible charge-trap-transistor (CTT). IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 10 (2018), 18111819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Dziugaite Gintare Karolina, Arpino Gabriel, and Roy Daniel M.. 2018. Towards generalization guarantees for SGD: Data-dependent PAC-bayes priors. (2018).Google ScholarGoogle Scholar
  21. [21] Engel J. H., Eryilmaz S. B., Kim S., BrightSky M., Lam C., Lung H., Olshausen B. A., and Wong H. P.. 2014. Capacity optimization of emerging memory systems: A Shannon-inspired approach to device characterization. In 2014 IEEE International Electron Devices Meeting. 29.4.1–29.4.4.Google ScholarGoogle Scholar
  22. [22] Fagbohungbe Omobayode and Qian Lijun. 2020. Benchmarking inference performance of deep learning models on analog devices. arXiv preprint arXiv:2011.11840 (2020).Google ScholarGoogle Scholar
  23. [23] Fan Angela, Stock Pierre, Graham Benjamin, Grave Edouard, Gribonval Rémi, Jégou Hervé, and Joulin Armand. 2020. Training with quantization noise for extreme model compression. (2020).Google ScholarGoogle Scholar
  24. [24] Fong Scott W., Neumann Christopher M., and Wong H.-S. Philip. 2017. Phase-change memory-towards a storage-class memory. IEEE Transactions on Electron Devices 64, 11 (2017), 43744385.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Frankle Jonathan and Carbin Michael. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. International Conference on Learning Representations (ICLR) (2019).Google ScholarGoogle Scholar
  26. [26] Grosse Roger and Martens James. 2016. A Kronecker-factored approximate Fisher matrix for convolution layers. In International Conference on Machine Learning. 573582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Guo Yiwen, Yao Anbang, and Chen Yurong. 2016. Dynamic network surgery for efficient DNNs. In Advances in Neural Information Processing Systems. 13791387.Google ScholarGoogle Scholar
  28. [28] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations (ICLR) (2016).Google ScholarGoogle Scholar
  29. [29] Han Song, Pool Jeff, Tran John, and Dally William. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 11351143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Hassibi Babak, Stork David G., Wolff Gregory, and Watanabe Takahiro. 1993. Optimal brain surgeon: Extensions and performance comparisons. In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS’93). San Francisco, CA, USA, 263270.Google ScholarGoogle Scholar
  31. [31] Havasi Marton, Peharz Robert, and Hernández-Lobato José Miguel. 2019. Minimal random code learning: Getting bits back from compressed model parameters. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  32. [32] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Hinton Geoffrey, Vinyals Oriol, and Dean Jeffrey. 2015. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop. http://arxiv.org/abs/1503.02531Google ScholarGoogle Scholar
  34. [34] Hooker Sara, Moorosi Nyalleng, Clark Gregory, Bengio Samy, and Denton Emily. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google ScholarGoogle Scholar
  35. [35] Isik Berivan. 2021. Neural 3D scene compression via model compression. arXiv preprint arXiv:2105.03120 (2021).Google ScholarGoogle Scholar
  36. [36] Isik Berivan, Chou Philip, Hwang Sung Jin, Johnston Nicholas, and Toderici George. 2021. LVAC: Learned volumetric attribute compression for point clouds using coordinate based networks. Frontiers in Signal Processing (2021), 65.Google ScholarGoogle Scholar
  37. [37] Isik Berivan, No Albert, and Weissman Tsachy. 2021. Rate-distortion theoretic model compression: Successive refinement for pruning. arXiv preprint arXiv:2102.08329 (2021).Google ScholarGoogle Scholar
  38. [38] Isik Berivan, Pase Francesco, Gunduz Deniz, Weissman Tsachy, and Michele Zorzi. 2023. Sparse random networks for communication-efficient federated learning. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=k1FHgri5y3-Google ScholarGoogle Scholar
  39. [39] Isik Berivan and Weissman Tsachy. 2022. Learning under storage and privacy constraints. In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 18441849.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Isik Berivan, Weissman Tsachy, and No Albert. 2022. An information-theoretic justification for model pruning. In International Conference on Artificial Intelligence and Statistics. PMLR, 38213846.Google ScholarGoogle Scholar
  41. [41] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Joshi V., Gallo M. Le, Haefeli Simon, Boybat I., Nandakumar S., Piveteau C., Dazzi M., Rajendran B., Sebastian A., and Eleftheriou E.. 2020. Accurate deep neural network inference using computational phase-change memory. Nature Communications 11 (2020).Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Khoram Soroosh and Li Jing. 2018. Adaptive quantization of neural networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  44. [44] Krizhevsky Alex, Hinton Geoffrey, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  45. [45] Kundu Souvik, Nazemi Mahdi, Beerel Peter A., and Pedram Massoud. 2021. DNR: A tunable robust pruning framework through dynamic network rewiring of DNNs. In Proceedings of the 26th Asia and South Pacific Design Automation Conference. 344350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436444.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] LeCun Yann, Bottou Léon, Bengio Yoshua, and Haffner Patrick. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 22782324.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] LeCun Yann, Cortes Corinna, and Burges C. J.. 2010. MNIST handwritten digit database. (2010).Google ScholarGoogle Scholar
  49. [49] Lee Namhoon, Ajanthan Thalaiyasingam, and Torr Philip H. S.. 2018. SNIP: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018).Google ScholarGoogle Scholar
  50. [50] Louizos Christos, Ullrich Karen, and Welling Max. 2017. Bayesian compression for deep learning. arXiv preprint arXiv:1705.08665 (2017).Google ScholarGoogle Scholar
  51. [51] Martens James. 2014. New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014).Google ScholarGoogle Scholar
  52. [52] Mildenhall Ben, Srinivasan Pratul P., Tancik Matthew, Barron Jonathan T., Ramamoorthi Ravi, and Ng Ren. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision. Springer, 405421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Nandakumar S. R., Boybat Irem, Joshi Vinay, Piveteau Christophe, Gallo Manuel Le, Rajendran Bipin, Sebastian Abu, and Eleftheriou Evangelos. 2019. Phase-change memory models for deep learning training and inference. In 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS). 727730. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Oktay Deniz, Ballé Johannes, Singh Saurabh, and Shrivastava Abhinav. 2019. Scalable model compression by entropy penalized reparameterization. arXiv preprint arXiv:1906.06624 (2019).Google ScholarGoogle Scholar
  55. [55] Papernot Nicolas, McDaniel Patrick, Wu Xi, Jha Somesh, and Swami Ananthram. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 582597.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Pase Francesco, Isik Berivan, Gunduz Deniz, Weissman Tsachy, and Zorzi Michele. [n. d.]. Efficient federated random subnetwork training. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022).Google ScholarGoogle Scholar
  57. [57] Polino Antonio, Pascanu Razvan, and Alistarh Dan. 2018. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018).Google ScholarGoogle Scholar
  58. [58] Reagan Brandon, Gupta Udit, Adolf Bob, Mitzenmacher Michael, Rush Alexander, Wei Gu-Yeon, and Brooks David. 2018. Weightless: Lossy weight encoding for deep neural network compression. In International Conference on Machine Learning. 43244333.Google ScholarGoogle Scholar
  59. [59] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Schmid Alexandre, Leblebici Yusuf, and Mlynek D.. 2000. Mixed analogue-digital artificial-neural-network architecture with on-chip learning. Circuits, Devices and Systems, IEE Proceedings - 146 (012000), 345349. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Sehwag Vikash, Wang Shiqi, Mittal Prateek, and Jana Suman. 2020. Hydra: Pruning adversarially robust neural networks. Advances in Neural Information Processing Systems (NeurIPS) 7 (2020).Google ScholarGoogle Scholar
  62. [62] Shannon Claude Elwood. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 1 (2001).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Singh Sidak Pal and Alistarh Dan. 2020. WoodFisher: Efficient second-order approximations for model compression. arXiv preprint arXiv:2004.14340 (2020).Google ScholarGoogle Scholar
  64. [64] Tan Mingxing and Le Quoc. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 61056114.Google ScholarGoogle Scholar
  65. [65] Nguyen Thuy Van, Nosratinia Aria, and Divsalar Dariush. 2012. The design of rate-compatible protograph LDPC codes. IEEE Transactions on Communications 60, 10 (2012), 28412850.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Wiedemann S., Kirchhoffer H., Matlage S., Haase P., Marban A., Marinč T., Neumann D., Nguyen T., Schwarz H., Wiegand T., Marpe D., and Samek W.. 2020. DeepCABAC: A universal compression algorithm for deep neural networks. IEEE Journal of Selected Topics in Signal Processing 14, 4 (2020), 700714.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Wong H.-S. Philip, Raoux Simone, Kim SangBum, Liang Jiale, Reifenberg John P., Rajendran Bipin, Asheghi Mehdi, and Goodson Kenneth E.. 2010. Phase change memory. Proc. IEEE 98, 12 (2010), 22012227.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Wu J. Y., Chen Y. S., Khwa W. S., Yu S. M., Wang T. Y., Tseng J. C., Chih Y. D., and Diaz Carlos H.. 2018. A 40nm low-power logic compatible phase change memory technology. In 2018 IEEE International Electron Devices Meeting (IEDM). IEEE, 27–6.Google ScholarGoogle Scholar
  69. [69] Xie Qizhe, Luong Minh-Thang, Hovy Eduard, and Le Quoc V.. 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1068710698.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Young Sean I., Zhe Wang, Taubman David, and Girod Bernd. 2020. Transform quantization for CNN compression. arXiv preprint arXiv:2009.01174 (2020).Google ScholarGoogle Scholar
  71. [71] Zarcone Ryan, Paiton Dylan, Anderson Alex, Engel Jesse, Wong H. S. Philip, and Olshausen Bruno. 2018. Joint source-channel coding with neural networks for analog data compression and storage. In 2018 Data Compression Conference. IEEE, 147156.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Zarcone Ryan V., Engel Jesse H., Eryilmaz S. Burc, Wan Weier, Kim SangBum, BrightSky Matthew, Lam Chung, Lung Hsiang-Lan, Olshausen Bruno A., and Wong H.-S. Philip. 2020. Author correction: Analog coding in emerging memory systems. Scientific Reports 10, 1 (August2020), 13404. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Zheng Xin, Zarcone Ryan, Paiton Dylan, Sohn Joon, Wan Weier, Olshausen Bruno, and Wong H.-S. Philip. 2018. Error-resilient analog image storage and compression with analog-valued RRAM arrays: An adaptive joint source-channel coding approach. In 2018 IEEE International Electron Devices Meeting (IEDM). IEEE, 35.Google ScholarGoogle Scholar
  74. [74] Zhou Chuteng, Kadambi Prad, Mattina Matthew, and Whatmough Paul N.. 2020. Noisy machines: Understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation. arXiv preprint arXiv:2001.04974 (2020).Google ScholarGoogle Scholar
  75. [75] Zhou Chuteng, Zhuang Quntao, Mattina Matthew, and Whatmough Paul N.. 2021. Information contraction in noisy binary neural networks and its implications. arXiv preprint arXiv:2101.11750 (2021).Google ScholarGoogle Scholar
  76. [76] Zhou Wenda, Veitch Victor, Austern Morgane, Adams Ryan P., and Orbanz Peter. 2018. Non-vacuous generalization bounds at the ImageNet scale: A PAC-Bayesian compression approach. arXiv preprint arXiv:1804.05862 (2018).Google ScholarGoogle Scholar

Index Terms

  1. Neural Network Compression for Noisy Storage Devices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 22, Issue 3
        May 2023
        546 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3592782
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 May 2023
        • Online AM: 18 March 2023
        • Accepted: 3 February 2023
        • Revised: 2 September 2022
        • Received: 17 March 2022
        Published in tecs Volume 22, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)133
        • Downloads (Last 6 weeks)39

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!