skip to main content
research-article
Open Access

Lane Compression: A Lightweight Lossless Compression Method for Machine Learning on Embedded Systems

Authors Info & Claims
Published:18 March 2021Publication History
Skip Abstract Section

Abstract

This article presents Lane Compression, a lightweight lossless compression technique for machine learning that is based on a detailed study of the statistical properties of machine learning data. The proposed technique profiles machine learning data gathered ahead of run-time and partitions values bit-wise into different lanes with more distinctive statistical characteristics. Then the most appropriate compression technique is chosen for each lane out of a small number of low-cost compression techniques. Lane Compression’s compute and memory requirements are very low and yet it achieves a compression rate comparable to or better than Huffman coding. We evaluate and analyse Lane Compression on a wide range of machine learning networks for both inference and re-training. We also demonstrate the profiling prior to run-time and the ability to configure the hardware based on the profiling guarantee robust performance across different models and datasets. Hardware implementations are described and the scheme’s simplicity makes it suitable for compressing both on-chip and off-chip traffic.

References

  1. Ziad Asghar and Jeff Gehlhaar. 2019. 2019 Snapdragon 865 5G AI Deep Dive. Retrieved from https://www.qualcomm.com/media/documents/files/2019-snapdragon-865-5g-ai-deep-dive-ziad-asghar-jeff-gehlhaar.pdf.Google ScholarGoogle Scholar
  2. Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, and Avi Mendelson. 2019. CAT: Compression-Aware Training for bandwidth reduction. Retrieved from https://arxiv:cs.CV/1909.11481.Google ScholarGoogle Scholar
  3. Talal Bonny and Jörg Henkel. 2010. Huffman-based code compression techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 15, 4, Article 31 (Oct. 2010), 37 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lukas Cavigelli, Georg Rutishauser, and Luca Benini. 2019. EBPC: Extended bit-plane compression for deep neural network inference and training accelerators. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 4 (Dec. 2019), 723--734.Google ScholarGoogle Scholar
  5. Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52, 1 (Jan. 2017), 127--138.Google ScholarGoogle ScholarCross RefCross Ref
  6. Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 2 (June 2019), 292--308.Google ScholarGoogle Scholar
  7. Soumith Chintala. 2016. Word-level language modeling RNN. Retrieved from https://github.com/pytorch/examples/tree/master/word_language_model.Google ScholarGoogle Scholar
  8. Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2020. Universal deep neural network compression. IEEE J. Select. Top. Signal Process. 14, 4 (2020), 1--1.Google ScholarGoogle ScholarCross RefCross Ref
  9. Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight data compression algorithms: An experimental survey (experiments and analyses). In Proceedings of the International Conference on Extending Database Technology (EDBT’17).Google ScholarGoogle Scholar
  10. Alberto Delmas, Sayeh Sharify, Patrick Judd, Milos Nikolic, and Andreas Moshovos. 2018. DPRed: Making typical activation values matter in deep learning computing. Retrieved from https://arxiv:1804.06732.Google ScholarGoogle Scholar
  11. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  12. Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 1--48.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fabrice Devaux. 2019. The true processing in memory accelerator. In Proceedings of the 31st IEEE Hot Chips Symposium (HCS’09). 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  14. Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, and Luca Benini. 2019. PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V-based parallel ultra low power clusters. In Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS’19). 33--36.Google ScholarGoogle ScholarCross RefCross Ref
  15. Georgios Georgiadis. 2018. Accelerating convolutional neural networks via activation map compression. Retrieved from https://arxiv:1812.04056.Google ScholarGoogle Scholar
  16. Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, New York, NY, 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. CoRR abs/1510.00149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 1135--1143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  21. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. Retrieved from https://arxiv:1704.04861.Google ScholarGoogle Scholar
  22. David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sep. 1952), 1098--1101.Google ScholarGoogle ScholarCross RefCross Ref
  23. Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360.Google ScholarGoogle Scholar
  24. Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient data encoding for deep neural network training. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 776--789. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4345--4354.Google ScholarGoogle ScholarCross RefCross Ref
  27. Wonkyung Jung, Daejin Jung, , Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. 2018. Restructuring Batch Normalization to Accelerate CNN Training. Retrieved from https://arxiv:1807.01702.Google ScholarGoogle Scholar
  28. Hyunjun Kim. 2016. SqueezeNet v1.1. Retrieved from https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1.Google ScholarGoogle Scholar
  29. Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 329--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Morten Kjelsø, Mark Gooch, and Simon Jones. 1996. Design and performance of a main memory hardware data compressor. In Proceedings of 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies. 423--430.Google ScholarGoogle Scholar
  31. Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Conference on Design, Automation, and Test in Europe. European Design and Automation Association, 199--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Saluka Kodituwakku and U. S. Amarasinghe. 2010. Comparison of lossless data compression algorithms for text data. Indian J. Comput. Sci. Eng. 1 (12 2010).Google ScholarGoogle Scholar
  33. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto. Retrieved from https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.Google ScholarGoogle Scholar
  34. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates Inc., 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  36. Arm Ltd. 2019. Arm Ethos-N series processors. Retrieved from https://developer.arm.com/ip-products/processors/machine-learning/arm-ethos-n.Google ScholarGoogle Scholar
  37. Partha Maji, Daniel Bates, Alex Chadwick, and Robert Mullins. 2017. ADaPT: Optimizing CNN inference on IoT and mobile devices using approximately separable 1-D kernels. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning. ACM, New York, NY, Article 43, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer sentinel mixture models. Retrieved from https://arxiv:1609.07843.Google ScholarGoogle Scholar
  39. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  40. Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. Retrieved from https://arxiv:1603.01025.Google ScholarGoogle Scholar
  41. Miloc Nikolic, Mostafa Mahmoud, Yiren Zhao, Robert Mullins, and Andreas Moshovos. 2019. Characterizing sources of ineffectual computations in deep learning networks. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’19). 165--176.Google ScholarGoogle ScholarCross RefCross Ref
  42. Nvidia. 2019. Deep Learning Performance. Retrieved from https://docs.nvidia.com/deeplearning/sdk/pdf/Deep-Learning-Performance-Guide.pdf.Google ScholarGoogle Scholar
  43. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY, 27--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Richard Clark Pasco. 1976. Source Coding Algorithms for Fast Data Compression. Ph.D. Dissertation.Google ScholarGoogle Scholar
  45. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, NY, 377--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Minsoo Rhu, Mike O’Connor, Niladrish Chatterjee, Jeff Pool, and Stephen W. Keckler. 2017. Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. Retrieved from https://arxiv:1705.01626.Google ScholarGoogle Scholar
  47. Mark A. Roth and Scott J. Van Horn. 1993. Database compression. SIGMOD Rec. 22, 3 (Sept. 1993), 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Amir Said. 2004. Introduction to Arithmetic Coding—Theory and Practice. Technical Report HPL-2004-76. Imaging Systems Laboratory, HP Laboratories, Palo Alto, CA.Google ScholarGoogle Scholar
  49. Claude E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Techn. J. 27, 3 (1948), 379--423.Google ScholarGoogle ScholarCross RefCross Ref
  50. Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benin. 2018. A transprecision floating-point platform for ultra-low power computing. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’18). 1051--1056.Google ScholarGoogle ScholarCross RefCross Ref
  51. Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ying Wang, Huawei Li, and Xiaowei Li. 2018. A case of on-chip memory subsystem design for low-power CNN accelerators. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 10 (Oct. 2018), 1971--1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Terry A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (June 1984), 8--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  55. Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20). Association for Computing Machinery, New York, NY, 369--383. DOI:https://doi.org/10.1145/3373376.3378514 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Amir Yazdanbakhsh, Choungki Song, Jacob Sacks, Pejman Lotfi-Kamran, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. In-DRAM near-data approximate acceleration for GPUs. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT’18). Association for Computing Machinery, New York, NY, Article 34, 14 pages. DOI:https://doi.org/10.1145/3243176.3243188 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 373--390.Google ScholarGoogle ScholarCross RefCross Ref
  58. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. Retrieved from https://arxiv:1606.06160.Google ScholarGoogle Scholar

Index Terms

  1. Lane Compression: A Lightweight Lossless Compression Method for Machine Learning on Embedded Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!