Abstract
Compression and efficient storage of neural network (NN) parameters is critical for applications that run on resource-constrained devices. Despite the significant progress in NN model compression, there has been considerably less investigation in the actual physical storage of NN parameters. Conventionally, model compression and physical storage are decoupled, as digital storage media with error-correcting codes (ECCs) provide robust error-free storage. However, this decoupled approach is inefficient as it ignores the overparameterization present in most NNs and forces the memory device to allocate the same amount of resources to every bit of information regardless of its importance. In this work, we investigate analog memory devices as an alternative to digital media – one that naturally provides a way to add more protection for significant bits unlike its counterpart, but is noisy and may compromise the stored model’s performance if used naively. We develop a variety of robust coding strategies for NN weight storage on analog devices, and propose an approach to jointly optimize model compression and memory resource allocation. We then demonstrate the efficacy of our approach on models trained on MNIST, CIFAR-10, and ImageNet datasets for existing compression techniques. Compared to conventional error-free digital storage, our method reduces the memory footprint by up to one order of magnitude, without significantly compromising the stored model’s accuracy.
- [1] . 2019. Where is the information in a deep neural network? arXiv preprint arXiv:1905.12213 (2019).Google Scholar
- [2] . 2018. Scalable methods for 8-bit training of neural networks. In Advances in Neural Information Processing Systems. 5145–5153.Google Scholar
- [3] . 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- [4] . 2003. The IM algorithm: A variational approach to information maximization. In Advances in Neural Information Processing Systems. None.Google Scholar
- [5] . 2020. rTop-k: A statistical estimation approach to distributed SGD. IEEE Journal on Selected Areas in Information Theory 1, 3 (2020), 897–907.Google Scholar
Cross Ref
- [6] . 2016. Precise neural network computation with imprecise analog devices. arXiv preprint arXiv:1606.07786 (2016).Google Scholar
- [7] . 2021. 3D scene compression through entropy penalized neural representation functions. In 2021 Picture Coding Symposium (PCS). IEEE, 1–5.Google Scholar
- [8] . 2000. An on-chip learning neural network. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Vol. 4. 66–71.
DOI: Google ScholarCross Ref
- [9] . 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). Association for Computing Machinery, New York, NY, USA, 535–541.
DOI: Google ScholarDigital Library
- [10] . 2019. Accelerating deep neural networks with analog memory devices. In 2019 China Semiconductor Technology International Conference (CSTIC). 1–3.
DOI: Google ScholarCross Ref
- [11] . 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136.Google Scholar
Cross Ref
- [12] . 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision. 2857–2865.Google Scholar
Digital Library
- [13] . 2019. Neural joint source-channel coding. In International Conference on Machine Learning. PMLR, 1182–1192.Google Scholar
- [14] . 2020. Universal deep neural network compression. IEEE Journal of Selected Topics in Signal Processing (2020).Google Scholar
Cross Ref
- [15] . 1990. Optimal Brain Damage. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 598–605.Google Scholar
- [16] . 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223–1231.Google Scholar
Digital Library
- [17] . 2009. ImageNet: A large-scale hierarchical image database. In CVPR09.Google Scholar
- [18] . 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485–532.Google Scholar
Cross Ref
- [19] . 2018. An analog neural network computing engine using CMOS-compatible charge-trap-transistor (CTT). IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 10 (2018), 1811–1819.Google Scholar
Digital Library
- [20] . 2018. Towards generalization guarantees for SGD: Data-dependent PAC-bayes priors. (2018).Google Scholar
- [21] . 2014. Capacity optimization of emerging memory systems: A Shannon-inspired approach to device characterization. In 2014 IEEE International Electron Devices Meeting. 29.4.1–29.4.4.Google Scholar
- [22] . 2020. Benchmarking inference performance of deep learning models on analog devices. arXiv preprint arXiv:2011.11840 (2020).Google Scholar
- [23] . 2020. Training with quantization noise for extreme model compression. (2020).Google Scholar
- [24] . 2017. Phase-change memory-towards a storage-class memory. IEEE Transactions on Electron Devices 64, 11 (2017), 4374–4385.Google Scholar
Cross Ref
- [25] . 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. International Conference on Learning Representations (ICLR) (2019).Google Scholar
- [26] . 2016. A Kronecker-factored approximate Fisher matrix for convolution layers. In International Conference on Machine Learning. 573–582.Google Scholar
Digital Library
- [27] . 2016. Dynamic network surgery for efficient DNNs. In Advances in Neural Information Processing Systems. 1379–1387.Google Scholar
- [28] . 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations (ICLR) (2016).Google Scholar
- [29] . 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135–1143.Google Scholar
Digital Library
- [30] . 1993. Optimal brain surgeon: Extensions and performance comparisons. In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS’93). San Francisco, CA, USA, 263–270.Google Scholar
- [31] . 2019. Minimal random code learning: Getting bits back from compressed model parameters. In International Conference on Learning Representations (ICLR).Google Scholar
- [32] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [33] . 2015. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop. http://arxiv.org/abs/1503.02531Google Scholar
- [34] . 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google Scholar
- [35] . 2021. Neural 3D scene compression via model compression. arXiv preprint arXiv:2105.03120 (2021).Google Scholar
- [36] . 2021. LVAC: Learned volumetric attribute compression for point clouds using coordinate based networks. Frontiers in Signal Processing (2021), 65.Google Scholar
- [37] . 2021. Rate-distortion theoretic model compression: Successive refinement for pruning. arXiv preprint arXiv:2102.08329 (2021).Google Scholar
- [38] . 2023. Sparse random networks for communication-efficient federated learning. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=k1FHgri5y3-Google Scholar
- [39] . 2022. Learning under storage and privacy constraints. In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 1844–1849.Google Scholar
Cross Ref
- [40] . 2022. An information-theoretic justification for model pruning. In International Conference on Artificial Intelligence and Statistics. PMLR, 3821–3846.Google Scholar
- [41] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.Google Scholar
Cross Ref
- [42] . 2020. Accurate deep neural network inference using computational phase-change memory. Nature Communications 11 (2020).Google Scholar
Cross Ref
- [43] . 2018. Adaptive quantization of neural networks. In International Conference on Learning Representations.Google Scholar
- [44] . 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- [45] . 2021. DNR: A tunable robust pruning framework through dynamic network rewiring of DNNs. In Proceedings of the 26th Asia and South Pacific Design Automation Conference. 344–350.Google Scholar
Digital Library
- [46] . 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google Scholar
Cross Ref
- [47] . 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google Scholar
Cross Ref
- [48] . 2010. MNIST handwritten digit database. (2010).Google Scholar
- [49] . 2018. SNIP: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018).Google Scholar
- [50] . 2017. Bayesian compression for deep learning. arXiv preprint arXiv:1705.08665 (2017).Google Scholar
- [51] . 2014. New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014).Google Scholar
- [52] . 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision. Springer, 405–421.Google Scholar
Digital Library
- [53] . 2019. Phase-change memory models for deep learning training and inference. In 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS). 727–730.
DOI: Google ScholarCross Ref
- [54] . 2019. Scalable model compression by entropy penalized reparameterization. arXiv preprint arXiv:1906.06624 (2019).Google Scholar
- [55] . 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 582–597.Google Scholar
Cross Ref
- [56] . [n. d.]. Efficient federated random subnetwork training. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022).Google Scholar
- [57] . 2018. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018).Google Scholar
- [58] . 2018. Weightless: Lossy weight encoding for deep neural network compression. In International Conference on Machine Learning. 4324–4333.Google Scholar
- [59] . 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google Scholar
Cross Ref
- [60] . 2000. Mixed analogue-digital artificial-neural-network architecture with on-chip learning. Circuits, Devices and Systems, IEE Proceedings - 146 (
01 2000), 345–349.DOI: Google ScholarCross Ref
- [61] . 2020. Hydra: Pruning adversarially robust neural networks. Advances in Neural Information Processing Systems (NeurIPS) 7 (2020).Google Scholar
- [62] . 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 1 (2001).Google Scholar
Digital Library
- [63] . 2020. WoodFisher: Efficient second-order approximations for model compression. arXiv preprint arXiv:2004.14340 (2020).Google Scholar
- [64] . 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114.Google Scholar
- [65] . 2012. The design of rate-compatible protograph LDPC codes. IEEE Transactions on Communications 60, 10 (2012), 2841–2850.Google Scholar
Cross Ref
- [66] . 2020. DeepCABAC: A universal compression algorithm for deep neural networks. IEEE Journal of Selected Topics in Signal Processing 14, 4 (2020), 700–714.Google Scholar
Cross Ref
- [67] . 2010. Phase change memory. Proc. IEEE 98, 12 (2010), 2201–2227.Google Scholar
Cross Ref
- [68] . 2018. A 40nm low-power logic compatible phase change memory technology. In 2018 IEEE International Electron Devices Meeting (IEDM). IEEE, 27–6.Google Scholar
- [69] . 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687–10698.Google Scholar
Cross Ref
- [70] . 2020. Transform quantization for CNN compression. arXiv preprint arXiv:2009.01174 (2020).Google Scholar
- [71] . 2018. Joint source-channel coding with neural networks for analog data compression and storage. In 2018 Data Compression Conference. IEEE, 147–156.Google Scholar
Cross Ref
- [72] . 2020. Author correction: Analog coding in emerging memory systems. Scientific Reports 10, 1 (
August 2020), 13404.DOI: Google ScholarCross Ref
- [73] . 2018. Error-resilient analog image storage and compression with analog-valued RRAM arrays: An adaptive joint source-channel coding approach. In 2018 IEEE International Electron Devices Meeting (IEDM). IEEE, 3–5.Google Scholar
- [74] . 2020. Noisy machines: Understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation. arXiv preprint arXiv:2001.04974 (2020).Google Scholar
- [75] . 2021. Information contraction in noisy binary neural networks and its implications. arXiv preprint arXiv:2101.11750 (2021).Google Scholar
- [76] . 2018. Non-vacuous generalization bounds at the ImageNet scale: A PAC-Bayesian compression approach. arXiv preprint arXiv:1804.05862 (2018).Google Scholar
Index Terms
Neural Network Compression for Noisy Storage Devices
Recommendations
Space Oblivious Compression: Power Reduction for Non-Volatile Main Memories
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSIPower consumption of main memory has become a critical concern and has led to proposals to employ emerging non-volatile memories (NVMs) to replace or augment DRAM. This paper proposes Space Oblivious COmpression (SOCO), an in-place lightweight ...
Neural Network Weight Compression with NNW-BDI
MEMSYS '20: Proceedings of the International Symposium on Memory SystemsMemory is a scarce resource and increasingly so in the age of deep neural networks. Memory compression is a solution to the memory scarcity problem. This work proposes NNW-BDI, a scheme for compressing pretrained neural network weights. NNW-BDI is a ...
Adaptive weight compression for memory-efficient neural networks
DATE '17: Proceedings of the Conference on Design, Automation & Test in EuropeNeural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and ...






Comments