skip to main content
research-article

Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs

Published:08 October 2022Publication History
Skip Abstract Section

Abstract

Deep neural networks typically have extensive parameters and computational operations. Pruning and quantization techniques have been widely used to reduce the complexity of deep models. Both techniques can be jointly used for realizing significantly higher compression ratios. However, separate optimization processes and difficulties in choosing the hyperparameters limit the application of both the techniques simultaneously. In this study, we propose a novel compression framework, termed as quantized sparse training, that prunes and quantizes networks jointly in a unified training process. We integrate pruning and quantization into a gradient-based optimization process based on the straight-through estimator. Quantized sparse training enables us to simultaneously train, prune, and quantize a network from scratch. The empirical results validate the superiority of the proposed methodology over the recent state-of-the-art baselines with respect to both the model size and accuracy. Specifically, quantized sparse training achieves a 135 KB model size in the case of VGG16, without any accuracy degradation, which is 40% of the model size feasible based on the state-of-the-art pruning and quantization approach.

REFERENCES

  1. [1] Abdelaziz Hamzah, Shin Jong Hoon, Pedram Ardavan, Hassoun Joseph, et al. 2021. Rethinking floating point overheads for mixed precision DNN accelerators. Proc. Mach. Learn. Syst. 3 (2021), 223239.Google ScholarGoogle Scholar
  2. [2] Alvarez Jose M. and Salzmann Mathieu. 2016. Learning the number of neurons in deep networks. In Advances in Neural Information Processing Systems. MIT Press, 22702278.Google ScholarGoogle Scholar
  3. [3] Ba Jimmy and Caruana Rich. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems. MIT Press, 26542662.Google ScholarGoogle Scholar
  4. [4] Banner Ron, Nahshan Yury, Hoffer Elad, and Soudry Daniel. 2019. Post-training 4-bit quantization of convolution networks for rapid-deployment. In Advances in Neural Information Processing Systems. MIT Press.Google ScholarGoogle Scholar
  5. [5] Bellec Guillaume, Kappel David, Maass Wolfgang, and Legenstein Robert. 2017. Deep rewiring: Training very sparse deep networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  6. [6] Bengio Yoshua, Léonard Nicholas, and Courville Aaron. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. Retrieved from https://arXiv:1308.3432.Google ScholarGoogle Scholar
  7. [7] Bhalgat Yash, Lee Jinwon, Nagel Markus, Blankevoort Tijmen, and Kwak Nojun. 2020. LSQ+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 696697.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Tianqi, Goodfellow Ian, and Shlens Jonathon. 2015. Net2net: Accelerating learning via knowledge transfer. Retrieved from https://arXiv:1511.05641.Google ScholarGoogle Scholar
  9. [9] Choukroun Yoni, Kravchik Eli, Yang Fan, and Kisilev Pavel. 2019. Low-bit quantization of neural networks for efficient inference. In Proceedings of the ICCV Workshops. 30093018.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Courbariaux Matthieu, Hubara Itay, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or \( -1 \). Retrieved from https://arXiv:1602.02830.Google ScholarGoogle Scholar
  11. [11] Dai Xiaoliang, Yin Hongxu, and Jha Niraj K.. 2019. NeST: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Trans. Comput. 68, 10 (2019), 14871497.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Denton Emily L., Zaremba Wojciech, Bruna Joan, LeCun Yann, and Fergus Rob. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems. MIT Press, 12691277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Dettmers Tim and Zettlemoyer Luke. 2019. Sparse networks from scratch: Faster training without losing performance. In Advances in Neural Information Processing Systems. MIT Press.Google ScholarGoogle Scholar
  14. [14] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 41714186. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Dong Xuanyi and Yang Yi. 2019. Network pruning via transformable architecture search. Retrieved from https://arXiv:1905.09717.Google ScholarGoogle Scholar
  16. [16] Elthakeb Ahmed, Pilligundla Prannoy, Mireshghallah FatemehSadat, Yazdanbakhsh Amir, Gao Sicuan, and Esmaeilzadeh Hadi. 2019. Releq: An automatic reinforcement learning approach for deep quantization of neural networks. In Proceedings of the NeurIPS ML for Systems Workshop.Google ScholarGoogle Scholar
  17. [17] Esser Steven K., McKinstry Jeffrey L., Bablani Deepika, Appuswamy Rathinakumar, and Modha Dharmendra S.. 2020. Learned step size quantization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  18. [18] Feng Boyuan, Wang Yuke, Geng Tong, Li Ang, and Ding Yufei. 2021. APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores. Retrieved from https://arXiv:2106.12169.Google ScholarGoogle Scholar
  19. [19] Foldy-Porto Timothy, Venkatesha Yeshwanth, and Panda Priyadarshini. 2020. Activation density driven energy-efficient pruning in training. Retrieved from https://arXiv:2002.02949.Google ScholarGoogle Scholar
  20. [20] Frankle Jonathan and Carbin Michael. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  21. [21] Glorot Xavier and Bengio Yoshua. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249256.Google ScholarGoogle Scholar
  22. [22] Goldberg Yoav. 2016. A primer on neural network models for natural language processing. J. Artific. Intell. Res. 57 (2016), 345420.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Gong Ruihao, Liu Xianglong, Jiang Shenghu, Li Tianxiang, Hu Peng, Lin Jiazhen, Yu Fengwei, and Yan Junjie. 2019. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 48524861.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Gordon Ariel, Eban Elad, Nachum Ofir, Chen Bo, Wu Hao, Yang Tien-Ju, and Choi Edward. 2018. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15861595.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Guo Yiwen, Yao Anbang, and Chen Yurong. 2016. Dynamic network surgery for efficient dnns. In Advances in Neural Information Processing Systems. MIT Press, 13791387.Google ScholarGoogle Scholar
  26. [26] Han Song, Mao Huizi, and Dally William J.. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  27. [27] Han Song, Pool Jeff, Tran John, and Dally William. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. MIT Press, 11351143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 10261034.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] He Yang, Kang Guoliang, Dong Xuanyi, Fu Yanwei, and Yang Yi. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 22342240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] He Yihui, Lin Ji, Liu Zhijian, Wang Hanrui, Li Li-Jia, and Han Song. 2018. AMC: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision. 784800.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] He Yang, Liu Ping, Wang Ziwei, Hu Zhilan, and Yang Yi. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 43404349.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Hinton Geoffrey, Deng Li, Yu Dong, Dahl George, Mohamed Abdel-rahman, Jaitly Navdeep, Senior Andrew, Vanhoucke Vincent, Nguyen Patrick, Kingsbury Brian, et al. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29 (2012).Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Jain Sambhav R., Gural Albert, Wu Michael, and Dick Chris. 2019. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. Retrieved from https://arXiv:1903.08066.Google ScholarGoogle Scholar
  36. [36] Jang Eric, Gu Shixiang, and Poole Ben. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  37. [37] Jia Zhe, Maggioni Marco, Smith Jeffrey, and Scarpazza Daniele Paolo. 2019. Dissecting the NVidia turing T4 GPU via microbenchmarking. Retrieved from https://arXiv:1903.07486.Google ScholarGoogle Scholar
  38. [38] Kim Jangho, Bhalgat Yash, Lee Jinwon, Patel Chirag, and Kwak Nojun. 2019. QKD: Quantization-aware knowledge distillation. Retrieved from https://arXiv:1911.12491.Google ScholarGoogle Scholar
  39. [39] Krishnamoorthi Raghuraman. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. Retrieved from https://arXiv:1806.08342.Google ScholarGoogle Scholar
  40. [40] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] LeCun Yann, Denker John S., and Solla Sara A.. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems. MIT Press, 598605.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Leng Cong, Dou Zesheng, Li Hao, Zhu Shenghuo, and Jin Rong. 2018. Extremely low bit neural network: Squeeze the last bit out with admm. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Li Fengfu, Zhang Bo, and Liu Bin. 2016. Ternary weight networks. Retrieved from https://arXiv:1605.04711.Google ScholarGoogle Scholar
  44. [44] Li Hao, Kadav Asim, Durdanovic Igor, Samet Hanan, and Graf Hans Peter. 2017. Pruning filters for efficient convnets. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  45. [45] Liu Junjie, Xu Zhe, Shi Runbin, Cheung Ray C. C., and So Hayden K. H.. 2020. Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers. Retrieved from https://arXiv:2005.06870.Google ScholarGoogle Scholar
  46. [46] Liu Zhuang, Li Jianguo, Shen Zhiqiang, Huang Gao, Yan Shoumeng, and Zhang Changshui. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision. 27362744.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Liu Zechun, Mu Haoyuan, Zhang Xiangyu, Guo Zichao, Yang Xin, Cheng Kwang-Ting, and Sun Jian. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE International Conference on Computer Vision. 32963305.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Liu Zhuang, Sun Mingjie, Zhou Tinghui, Huang Gao, and Darrell Trevor. 2012. Neural networks for machine learning, coursera. Coursera, Video Lectures.Google ScholarGoogle Scholar
  49. [49] Liu Zhuang, Sun Mingjie, Zhou Tinghui, Huang Gao, and Darrell Trevor. 2019. Rethinking the value of network pruning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  50. [50] Louizos Christos, Reisser Matthias, Blankevoort Tijmen, Gavves Efstratios, and Welling Max. 2019. Relaxed quantization for discretized neural networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  51. [51] Louizos Christos, Ullrich Karen, and Welling Max. 2017. Bayesian compression for deep learning. In Advances in Neural Information Processing Systems. MIT Press.Google ScholarGoogle Scholar
  52. [52] Luo Jian-Hao and Wu Jianxin. 2018. Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Retrieved from https://arXiv:1805.08941.Google ScholarGoogle Scholar
  53. [53] Maddison Chris J., Mnih Andriy, and Teh Yee Whye. 2016. The concrete distribution: A continuous relaxation of discrete random variables. Retrieved from https://arxiv.org/abs/1611.00712?context=cs.Google ScholarGoogle Scholar
  54. [54] Micikevicius Paulius, Narang Sharan, Alben Jonah, Diamos Gregory, Elsen Erich, Garcia David, Ginsburg Boris, Houston Michael, Kuchaiev Oleksii, Venkatesh Ganesh, et al. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  55. [55] Minnehan Breton and Savakis Andreas. 2019. Cascaded projection: End-to-end network compression and acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1071510724.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Mocanu Decebal Constantin, Mocanu Elena, Stone Peter, Nguyen Phuong H., Gibescu Madeleine, and Liotta Antonio. 2018. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature Commun. 9, 1 (2018), 112.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Molchanov Pavlo, Mallya Arun, Tyree Stephen, Frosio Iuri, and Kautz Jan. 2019. Importance estimation for neural network pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1126411272.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Mostafa Hesham and Wang Xin. 2019. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In Proceedings of the International Conference on Machine Learning. PMLR, 46464655.Google ScholarGoogle Scholar
  59. [59] Nagel Markus, Baalen Mart van, Blankevoort Tijmen, and Welling Max. 2019. Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13251334.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Kopf Andreas, Yang Edward, DeVito Zachary, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, and Chintala Soumith. 2019. PyTorch: An imperative Style, high-performance deep learning library. In Advances in Neural Information Processing Systems. MIT Press, 80248035.Google ScholarGoogle Scholar
  61. [61] Qin Eric, Samajdar Ananda, Kwon Hyoukjun, Nadella Vineet, Srinivasan Sudarshan, Das Dipankar, Kaul Bharat, and Krishna Tushar. 2020. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 5870.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from https://arXiv:1409.1556.Google ScholarGoogle Scholar
  64. [64] Tung Frederick and Mori Greg. 2018. Clip-q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 78737882.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Uhlich Stefan, Mauch Lukas, Cardinaux Fabien, Yoshiyama Kazuki, Garcia Javier Alonso, Tiedemann Stephen, Kemp Thomas, and Nakamura Akira. 2020. Mixed precision dnns: All you need is a good parametrization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  66. [66] Baalen Mart van, Louizos Christos, Nagel Markus, Amjad Rana Ali, Wang Ying, Blankevoort Tijmen, and Welling Max. 2020. Bayesian bits: Unifying quantization and pruning. In Advances in Neural Information Processing Systems. MIT Press.Google ScholarGoogle Scholar
  67. [67] Vasquez Karina, Venkatesha Yeshwanth, Bhattacharjee Abhiroop, Moitra Abhishek, and Panda Priyadarshini. 2021. Activation density based mixed-precision quantization for energy efficient neural networks. Retrieved from https://arXiv:2101.04354.Google ScholarGoogle Scholar
  68. [68] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Adv. Neural Info. Process. Syst. 30 (2017).Google ScholarGoogle Scholar
  69. [69] Wang Kuan, Liu Zhijian, Lin Yujun, Lin Ji, and Han Song. 2019. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 86128620.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Wang Ziheng. 2020. SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Wen Wei, Wu Chunpeng, Wang Yandan, Chen Yiran, and Li Hai. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. MIT Press, 20742082.Google ScholarGoogle Scholar
  72. [72] Wu Bichen, Wang Yanghan, Zhang Peizhao, Tian Yuandong, Vajda Peter, and Keutzer Kurt. 2018. Mixed precision quantization of convnets via differentiable neural architecture search. Retrieved from https://arXiv:1812.00090.Google ScholarGoogle Scholar
  73. [73] Xu Shoukai, Li Haokun, Zhuang Bohan, Liu Jing, Cao Jiezhang, Liang Chuangrun, and Tan Mingkui. 2020. Generative low-bitwidth data free quantization. In Proceedings of the European Conference on Computer Vision. Springer, 117.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Yang Haichuan, Gui Shupeng, Zhu Yuhao, and Liu Ji. 2020. Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21782188.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Yang Huanrui, Wen Wei, and Li Hai. 2020. DeepHoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  76. [76] Ye Shaokai, Zhang Tianyun, Zhang Kaiqi, Li Jiayu, Xie Jiaming, Liang Yun, Liu Sijia, Lin Xue, and Wang Yanzhi. 2018. A unified framework of dnn weight pruning and weight clustering/quantization using admm. Retrieved from https://arXiv:1811.01907.Google ScholarGoogle Scholar
  77. [77] Yin Hongxu, Molchanov Pavlo, Alvarez Jose M., Li Zhizhong, Mallya Arun, Hoiem Derek, Jha Niraj K., and Kautz Jan. 2020. Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 87158724.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Yin Penghang, Lyu Jiancheng, Zhang Shuai, Osher Stanley, Qi Yingyong, and Xin Jack. 2019. Understanding straight-through estimator in training activation quantized neural nets. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  79. [79] Zhang Jie-Fang, Lee Ching-En, Liu Chester, Shao Yakun Sophia, Keckler Stephen W., and Zhang Zhengya. 2020. SNAP: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference. IEEE J. Solid-State Circ. 56, 2 (2020), 636647.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Zhang Tianyun, Ye Shaokai, Zhang Kaiqi, Tang Jian, Wen Wujie, Fardad Makan, and Wang Yanzhi. 2018. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision. 184199.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Zhao Ritchie, Hu Yuwei, Dotzel Jordan, Sa Chris De, and Zhang Zhiru. 2019. Improving neural network quantization without retraining using outlier channel splitting. In Proceedings of the International Conference on Machine Learning. PMLR, 75437552.Google ScholarGoogle Scholar
  82. [82] Zhou Aojun, Yao Anbang, Guo Yiwen, Xu Lin, and Chen Yurong. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  83. [83] Zhou Shuchang, Wu Yuxin, Ni Zekun, Zhou Xinyu, Wen He, and Zou Yuheng. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from https://arXiv:1606.06160.Google ScholarGoogle Scholar
  84. [84] Zhu Chen, Xu Zheng, Shafahi Ali, Shu Manli, Ghiasi Amin, and Goldstein Tom. 2020. Towards accurate quantization and pruning via data-free knowledge transfer. Retrieved from https://arXiv:2010.07334.Google ScholarGoogle Scholar
  85. [85] Zoph Barret and Le Quoc V.. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar

Index Terms

  1. Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
          September 2022
          526 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3561947
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 October 2022
          • Online AM: 15 July 2022
          • Accepted: 3 March 2022
          • Revised: 25 February 2022
          • Received: 1 June 2021
          Published in tecs Volume 21, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)493
          • Downloads (Last 6 weeks)43

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!