Abstract
This article presents Lane Compression, a lightweight lossless compression technique for machine learning that is based on a detailed study of the statistical properties of machine learning data. The proposed technique profiles machine learning data gathered ahead of run-time and partitions values bit-wise into different lanes with more distinctive statistical characteristics. Then the most appropriate compression technique is chosen for each lane out of a small number of low-cost compression techniques. Lane Compression’s compute and memory requirements are very low and yet it achieves a compression rate comparable to or better than Huffman coding. We evaluate and analyse Lane Compression on a wide range of machine learning networks for both inference and re-training. We also demonstrate the profiling prior to run-time and the ability to configure the hardware based on the profiling guarantee robust performance across different models and datasets. Hardware implementations are described and the scheme’s simplicity makes it suitable for compressing both on-chip and off-chip traffic.
- Ziad Asghar and Jeff Gehlhaar. 2019. 2019 Snapdragon 865 5G AI Deep Dive. Retrieved from https://www.qualcomm.com/media/documents/files/2019-snapdragon-865-5g-ai-deep-dive-ziad-asghar-jeff-gehlhaar.pdf.Google Scholar
- Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, and Avi Mendelson. 2019. CAT: Compression-Aware Training for bandwidth reduction. Retrieved from https://arxiv:cs.CV/1909.11481.Google Scholar
- Talal Bonny and Jörg Henkel. 2010. Huffman-based code compression techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 15, 4, Article 31 (Oct. 2010), 37 pages. Google Scholar
Digital Library
- Lukas Cavigelli, Georg Rutishauser, and Luca Benini. 2019. EBPC: Extended bit-plane compression for deep neural network inference and training accelerators. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 4 (Dec. 2019), 723--734.Google Scholar
- Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52, 1 (Jan. 2017), 127--138.Google Scholar
Cross Ref
- Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 2 (June 2019), 292--308.Google Scholar
- Soumith Chintala. 2016. Word-level language modeling RNN. Retrieved from https://github.com/pytorch/examples/tree/master/word_language_model.Google Scholar
- Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2020. Universal deep neural network compression. IEEE J. Select. Top. Signal Process. 14, 4 (2020), 1--1.Google Scholar
Cross Ref
- Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight data compression algorithms: An experimental survey (experiments and analyses). In Proceedings of the International Conference on Extending Database Technology (EDBT’17).Google Scholar
- Alberto Delmas, Sayeh Sharify, Patrick Judd, Milos Nikolic, and Andreas Moshovos. 2018. DPRed: Making typical activation values matter in deep learning computing. Retrieved from https://arxiv:1804.06732.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google Scholar
Cross Ref
- Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 1--48.Google Scholar
Cross Ref
- Fabrice Devaux. 2019. The true processing in memory accelerator. In Proceedings of the 31st IEEE Hot Chips Symposium (HCS’09). 1--24.Google Scholar
Cross Ref
- Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, and Luca Benini. 2019. PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V-based parallel ultra low power clusters. In Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS’19). 33--36.Google Scholar
Cross Ref
- Georgios Georgiadis. 2018. Accelerating convolutional neural networks via activation map compression. Retrieved from https://arxiv:1812.04056.Google Scholar
- Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, New York, NY, 37--47. Google Scholar
Digital Library
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 243--254. Google Scholar
Digital Library
- Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. CoRR abs/1510.00149.Google Scholar
Digital Library
- Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 1135--1143. Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
Cross Ref
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. Retrieved from https://arxiv:1704.04861.Google Scholar
- David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sep. 1952), 1098--1101.Google Scholar
Cross Ref
- Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360.Google Scholar
- Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient data encoding for deep neural network training. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 776--789. Google Scholar
Digital Library
- Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY. Google Scholar
Digital Library
- Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4345--4354.Google Scholar
Cross Ref
- Wonkyung Jung, Daejin Jung, , Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. 2018. Restructuring Batch Normalization to Accelerate CNN Training. Retrieved from https://arxiv:1807.01702.Google Scholar
- Hyunjun Kim. 2016. SqueezeNet v1.1. Retrieved from https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1.Google Scholar
- Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 329--340. Google Scholar
Digital Library
- Morten Kjelsø, Mark Gooch, and Simon Jones. 1996. Design and performance of a main memory hardware data compressor. In Proceedings of 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies. 423--430.Google Scholar
- Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Conference on Design, Automation, and Test in Europe. European Design and Automation Association, 199--204. Google Scholar
Digital Library
- Saluka Kodituwakku and U. S. Amarasinghe. 2010. Comparison of lossless data compression algorithms for text data. Indian J. Comput. Sci. Eng. 1 (12 2010).Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto. Retrieved from https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates Inc., 1097--1105. Google Scholar
Digital Library
- Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google Scholar
Cross Ref
- Arm Ltd. 2019. Arm Ethos-N series processors. Retrieved from https://developer.arm.com/ip-products/processors/machine-learning/arm-ethos-n.Google Scholar
- Partha Maji, Daniel Bates, Alex Chadwick, and Robert Mullins. 2017. ADaPT: Optimizing CNN inference on IoT and mobile devices using approximately separable 1-D kernels. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning. ACM, New York, NY, Article 43, 12 pages. Google Scholar
Digital Library
- Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer sentinel mixture models. Retrieved from https://arxiv:1609.07843.Google Scholar
- Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. Retrieved from https://arxiv:1603.01025.Google Scholar
- Miloc Nikolic, Mostafa Mahmoud, Yiren Zhao, Robert Mullins, and Andreas Moshovos. 2019. Characterizing sources of ineffectual computations in deep learning networks. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’19). 165--176.Google Scholar
Cross Ref
- Nvidia. 2019. Deep Learning Performance. Retrieved from https://docs.nvidia.com/deeplearning/sdk/pdf/Deep-Learning-Performance-Guide.pdf.Google Scholar
- Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY, 27--40. Google Scholar
Digital Library
- Richard Clark Pasco. 1976. Source Coding Algorithms for Fast Data Compression. Ph.D. Dissertation.Google Scholar
- Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, NY, 377--388. Google Scholar
Digital Library
- Minsoo Rhu, Mike O’Connor, Niladrish Chatterjee, Jeff Pool, and Stephen W. Keckler. 2017. Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. Retrieved from https://arxiv:1705.01626.Google Scholar
- Mark A. Roth and Scott J. Van Horn. 1993. Database compression. SIGMOD Rec. 22, 3 (Sept. 1993), 31--39. Google Scholar
Digital Library
- Amir Said. 2004. Introduction to Arithmetic Coding—Theory and Practice. Technical Report HPL-2004-76. Imaging Systems Laboratory, HP Laboratories, Palo Alto, CA.Google Scholar
- Claude E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Techn. J. 27, 3 (1948), 379--423.Google Scholar
Cross Ref
- Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benin. 2018. A transprecision floating-point platform for ultra-low power computing. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’18). 1051--1056.Google Scholar
Cross Ref
- Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.Google Scholar
Digital Library
- Ying Wang, Huawei Li, and Xiaowei Li. 2018. A case of on-chip memory subsystem design for low-power CNN accelerators. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 10 (Oct. 2018), 1971--1984. Google Scholar
Digital Library
- Terry A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (June 1984), 8--19. Google Scholar
Digital Library
- Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20). Association for Computing Machinery, New York, NY, 369--383. DOI:https://doi.org/10.1145/3373376.3378514 Google Scholar
Digital Library
- Amir Yazdanbakhsh, Choungki Song, Jacob Sacks, Pejman Lotfi-Kamran, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. In-DRAM near-data approximate acceleration for GPUs. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT’18). Association for Computing Machinery, New York, NY, Article 34, 14 pages. DOI:https://doi.org/10.1145/3243176.3243188 Google Scholar
Digital Library
- Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 373--390.Google Scholar
Cross Ref
- Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. Retrieved from https://arxiv:1606.06160.Google Scholar
Index Terms
Lane Compression: A Lightweight Lossless Compression Method for Machine Learning on Embedded Systems
Recommendations
Predicting and Optimizing Image Compression
MM '16: Proceedings of the 24th ACM international conference on MultimediaImage compression is a core task for mobile devices, social media and cloud storage backend services. Key evaluation criteria for compression are: the quality of the output, the compression ratio achieved and the computational time (and energy) ...
A Very High Throughput Deblocking Filter for H.264/AVC
This paper presents a novel hardware architecture for the real-time high-throughput implementation of the adaptive deblocking filtering process specified by the H.264/AVC video coding standard. A parallel filtering order of six units is proposed ...
Block based learned image compression
AbstractEfficient image compression is very important for storage, retrieval, processing and transmission of image contents. The objective is to find a striking balance between compression ratio and the distortion in image. Recently, there has been a rise ...






Comments