skip to main content
research-article
Open Access

3PXNet: Pruned-Permuted-Packed XNOR Networks for Edge Machine Learning

Published:06 February 2020Publication History
Skip Abstract Section

Abstract

As the adoption of Neural Networks continues to proliferate different classes of applications and systems, edge devices have been left behind. Their strict energy and storage limitations make them unable to cope with the sizes of common network models. While many compression methods such as precision reduction and sparsity have been proposed to alleviate this, they don’t go quite far enough. To push size reduction to its absolute limits, we combine binarization with sparsity in Pruned-Permuted-Packed XNOR Networks (3PXNet), which can be efficiently implemented on even the smallest of embedded microcontrollers. 3PXNets can reduce model sizes by up to 38X and reduce runtime by up to 3X compared with already compact conventional binarized implementations with less than 3% accuracy reduction. We have created the first software implementation of sparse-binarized Neural Networks, released as open source library targeting edge devices. Our library is complete with training methodology and model generating scripts, making it easy and fast to deploy.

References

  1. Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frederic Petrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the International Joint Conference on Neural Networks 2017-May (2017), 2547--2554. DOI:https://doi.org/10.1109/IJCNN.2017.7966166 arxiv:1609.00222Google ScholarGoogle ScholarCross RefCross Ref
  2. Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2018. Yoda NN: An architecture for ultralow power binary-weight CNN acceleration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 1 (2018), 48--60. DOI:https://doi.org/10.1109/TCAD.2017.2682138 arxiv:1606.05487Google ScholarGoogle ScholarCross RefCross Ref
  3. Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 1--18. DOI:https://doi.org/10.1145/3005348 arxiv:1512.08571Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sajid Anwar and Wonyong Sung. 2016. Compact deep convolutional neural networks with coarse pruning. CoRR abs/1610.09639 (2016). http://arxiv.org/abs/1610.09639Google ScholarGoogle Scholar
  5. ARM. 2018. ARM Compute Library. Retrieved from https://github.com/ARM-software/ComputeLibrary.Google ScholarGoogle Scholar
  6. ARM. 2018. NEON. Retrieved from https://github.com/ARM-software/CMSIS_5.Google ScholarGoogle Scholar
  7. ARM. 2019. ARM CMSIS-NN. Retrieved from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.Google ScholarGoogle Scholar
  8. ARM Limited. 2017. ARMv6-M Architecture Reference Manual. 1,138 pages. DOI:https://doi.org/ARM DDI 0419DGoogle ScholarGoogle Scholar
  9. ARM Limited. 2018. ARMv7-M Architecture Reference Manual. 1,138 pages. DOI:https://doi.org/ARM DDI 0403EGoogle ScholarGoogle Scholar
  10. A. A. Bahou, G. Karunaratne, R. Andri, L. Cavigelli, and L. Benini. 2018. XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks. In Proceedings of the 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS). 1--3. DOI:https://doi.org/10.1109/CoolChips.2018.8373076Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Bankman, L. Yang, B. Moons, M. Verhelst, and B. Murmann. 2019. An always-on 3.8 μ J/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. IEEE Journal of Solid-State Circuits 54, 1 (Jan. 2019), 158--172. DOI:https://doi.org/10.1109/JSSC.2018.2869150Google ScholarGoogle ScholarCross RefCross Ref
  12. Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. (Oct. 2006). Retrieved from https://hal.inria.fr/inria-00112631.Google ScholarGoogle Scholar
  13. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arxiv:cs.DC/1512.01274Google ScholarGoogle Scholar
  14. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, Piscataway, NJ, 367--379. DOI:https://doi.org/10.1109/ISCA.2016.40Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. CoRR abs/1710.0 (2017), 1--10. DOI:https://doi.org/10.1109/ICRA.2016.7487304 arxiv:1710.09282Google ScholarGoogle Scholar
  16. Zhiyong Cheng, Daniel Soudry, Zexi Mao, and Zhen-zhong Lan. 2015. Training binary multilayer neural networks for image classification using expectation backpropagation. CoRR abs/1503.03562 (2015). arxiv:1503.03562 http://arxiv.org/abs/1503.03562Google ScholarGoogle Scholar
  17. Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article 113, 9 pages. DOI:https://doi.org/10.1145/2463209.2488873Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Conti, P. D. Schiavone, and L. Benini. 2018. XNOR neural engine: A hardware accelerator IP for 21.6-fJ/op binary neural network inference. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (Nov 2018), 2940--2951. DOI:https://doi.org/10.1109/TCAD.2018.2857019Google ScholarGoogle ScholarCross RefCross Ref
  19. Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. CoRR abs/1602.02830 (2016). http://arxiv.org/abs/1602.02830Google ScholarGoogle Scholar
  20. Yann Le Cun, John S Denker, and Sara a Solla. 1990. Optimal brain damage. Advances in Neural Information Processing Systems 2, 1 (1990), 598--605. DOI:https://doi.org/10.1.1.32.7223 arxiv:arXiv:1011.1669v3Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, and Guoqi Li. 2018. GXNOR-net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework. Neural Networks 100 (2018), 49--58. DOI:https://doi.org/10.1016/j.neunet.2018.01.010 arxiv:1705.09283Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Julian Faraone, Nicholas Fraser, Giulio Gambardella, Michaela Blott, and Philip H. W. Leong. 2017. Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10635 LNCS (2017), 393--404. DOI:https://doi.org/10.1007/978-3-319-70096-0_41 arxiv:1709.06262Google ScholarGoogle Scholar
  23. Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM’17). ACM, New York, NY, 25--30. DOI:https://doi.org/10.1145/3029580.3029586Google ScholarGoogle Scholar
  24. Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. ProtoNN: Compressed and accurate kNN for resource-scarce devices. In Proceedings of the34th International Conference on Machine Learning (ICML 2017) 70 (2017), 1331--1340. http://proceedings.mlr.press/v70/gupta17a.html.Google ScholarGoogle Scholar
  25. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML’15). JMLR.org, 1737--1746. http://dl.acm.org/citation.cfm?id=3045118.3045303Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. (2015), 1--14. DOI:https://doi.org/abs/1510.00149/1510.00149 arxiv:1510.00149Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1135--1143. http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf.Google ScholarGoogle Scholar
  28. Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal brain surgeon and general network pruning. 293--299 pages. DOI:https://doi.org/10.1109/ICNN.1993.298572Google ScholarGoogle Scholar
  29. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  30. Z. He, B. Gong, and D. Fan. 2019. Optimize deep convolutional neural network with ternarized weights and high accuracy. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). 913--921. DOI:https://doi.org/10.1109/WACV.2019.00102Google ScholarGoogle ScholarCross RefCross Ref
  31. Mattias P. Heinrich, Max Blendowski, and Ozan Oktay. 2018. TernaryNet: Faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions. International Journal of Computer Assisted Radiology and Surgery 13, 9 (2018), 1113--1320. DOI:https://doi.org/10.1007/s11548-018-1797-4 arxiv:1801.09449Google ScholarGoogle ScholarCross RefCross Ref
  32. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017). arxiv:1704.04861 http://arxiv.org/abs/1704.04861Google ScholarGoogle Scholar
  33. Yuwei Hu, Jidong Zhai, Dinghua Li, Yifan Gong, Yuhao Zhu, Wei Liu, Lei Su, and Jiangming Jin. 2018. BitFlow: Exploiting vector parallelism for binary neural networks on CPU. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2018), 244--253. DOI:https://doi.org/10.1109/IPDPS.2018.00034Google ScholarGoogle ScholarCross RefCross Ref
  34. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4107--4115. http://papers.nips.cc/paper/6573-binarized-neural-networks.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). arxiv:1602.07360 http://arxiv.org/abs/1602.07360Google ScholarGoogle Scholar
  36. L. Jiang, M. Kim, W. Wen, and D. Wang. 2017. XNOR-POP: A processing-in-memory architecture for binary convolutional neural networks in wide-IO2 DRAMs. In Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1--6. DOI:https://doi.org/10.1109/ISLPED.2017.8009163Google ScholarGoogle ScholarCross RefCross Ref
  37. Norman P. Jouppi, Al Borchers, and Rick Boyle et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA’17 (2017), 1--12. DOI:https://doi.org/10.1145/3079856.3080246 arxiv:1704.04760Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Patrick Judd, Jorge Albericio, and Andreas Moshovos. 2017. Stripes: Bit-serial deep neural network computing. IEEE Computer Architecture Letters 16, 1 (2017), 80--83. DOI:https://doi.org/10.1109/LCA.2016.2597140 arxiv:arXiv:1011.1669v3Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Patrick Judd, Alberto Delmas, Sayeh Sharify, and Andreas Moshovos. 2017. Cnvlutin2: Ineffectual-activation-and-weight-free deep neural network computing. (2017), 1--6. arxiv:1705.00125 http://arxiv.org/abs/1705.00125Google ScholarGoogle Scholar
  40. Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. 37 (2016). arxiv:1601.06071 http://arxiv.org/abs/1601.06071Google ScholarGoogle Scholar
  41. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 cite arxiv:1412.6980 Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.Google ScholarGoogle Scholar
  42. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (2012), 1--9. DOI:https://doi.org/10.1016/j.protcy.2014.09.007 arxiv:1102.0183Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient machine learning in 2KB RAM for the Internet of Things. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017) 70 (2017), 1935--1944. https://www.microsoft.com/en-us/research/publication/resource-efficient-machine-learning-2-kb-ram-internet-things/.Google ScholarGoogle Scholar
  44. Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, and Pradeep Dubey. 2017. Ternary residual networks. (2017), 1--16. arxiv:1707.04679 http://arxiv.org/abs/1707.04679Google ScholarGoogle Scholar
  45. Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs. (2018), 1--10. arxiv:1801.06601 http://arxiv.org/abs/1801.06601Google ScholarGoogle Scholar
  46. Vadim Lebedev and Victor Lempitsky. 2015. Fast ConvNets using group-wise brain damage. (2015). DOI:https://doi.org/10.1109/CVPR.2016.280 arxiv:1506.02515Google ScholarGoogle Scholar
  47. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of IEEE 86, 11 (Nov. 1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  48. Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. Nips (2016). arxiv:1605.04711 http://arxiv.org/abs/1605.04711Google ScholarGoogle Scholar
  49. Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, and Fengbo Ren. 2017. A 7.663-TOPS 8.2-W energy-efficient FPGA accelerator for binary convolutional neural networks (abstract only). In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA’17 March (2017), 290--291. DOI:https://doi.org/10.1145/3020078.3021786 arxiv:1702.06392Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yixing Li and Fengbo Ren. 2018. Build a compact binary neural network through bit-level sensitivity and data pruning. (2018). arxiv:1802.00904 http://arxiv.org/abs/1802.00904Google ScholarGoogle Scholar
  51. Yixing Li, Kai Xu, and Hao Yu. 2017. A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. CoRR (2017). arxiv:1702.06392Google ScholarGoogle Scholar
  52. Ling Liang, Lei Deng, Yueling Zeng, Xing Hu, Yu Ji, Xin Ma, Guoqi Li, and Yuan Xie. 2018. Crossbar-aware neural network pruning. (2018), 1--13. arxiv:1807.10816 http://arxiv.org/abs/1807.10816Google ScholarGoogle Scholar
  53. Jeng Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, Mani Srivastava, Zhuowen Tu, and Rajesh K. Gupta. 2017. Binarized convolutional neural networks with separable filters for efficient hardware acceleration. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2017-July, 1 (2017), 344--352. DOI:https://doi.org/10.1109/CVPRW.2017.48 arxiv:1707.04693Google ScholarGoogle Scholar
  54. Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. 3 (2017), 1--14. arxiv:1711.11294 http://arxiv.org/abs/1711.11294Google ScholarGoogle Scholar
  55. Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  56. Bradley McDanel, Surat Teerapittayanon, and H. T. Kung. 2017. Embedded binarized neural networks. (2017), 1--6. arxiv:1709.02260 http://arxiv.org/abs/1709.02260Google ScholarGoogle Scholar
  57. Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, and Pradeep Dubey. 2017. Ternary neural networks with fine-grained quantization. (2017). arxiv:1705.01462 http://arxiv.org/abs/1705.01462Google ScholarGoogle Scholar
  58. Wojciech Mula, Nathan Kurz, and Daniel Lemire. 2018. Faster population counts using AVX2 instructions. Computer Journal 61, 1 (2018), 111--120. DOI:https://doi.org/10.1093/comjnl/bxx046 arxiv:1611.07612Google ScholarGoogle ScholarCross RefCross Ref
  59. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. (2017). DOI:https://doi.org/10.1145/3079856.3080254 arxiv:1708.04485Google ScholarGoogle Scholar
  60. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W.Google ScholarGoogle Scholar
  61. Fabrizio Pedersoli, George Tzanetakis, and Andrea Tagliasacchi. 2018. Espresso: Efficient forward propagation for BCNNs. (2018), 1--10. arxiv:arXiv:1705.07175v2Google ScholarGoogle Scholar
  62. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv preprint (2016), 1--17. DOI:https://doi.org/10.1007/978-3-319-46493-0 arxiv:1603.05279Google ScholarGoogle Scholar
  63. Shimpei Sato, Hiroki Nakahara, and Shinya Takamaeda-yamazaki. 2018. BRein memory: A single-chip binary/ternary reconfigurable in-memory deep neural network. 53, 4 (2018), 983--994.Google ScholarGoogle Scholar
  64. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014), 1--14. DOI:https://doi.org/10.1016/j.infsof.2008.09.005 arxiv:1409.1556Google ScholarGoogle Scholar
  65. Ranko Sredojevic, Shaoyi Cheng, Lazar Supic, Rawan Naous, and Vladimir Stojanovic. 2017. Structured deep neural network pruning via matrix pivoting. (2017), 1--16. arxiv:1712.01084 http://arxiv.org/abs/1712.01084Google ScholarGoogle Scholar
  66. STMicroelectronics. 2018. STM32 Nucleo-144 boards. https://www.st.com/resource/en/data_brief/nucleo-f746zg.pdf.Google ScholarGoogle Scholar
  67. Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, and Michaela Blott. 2017. FINN: A framework for fast, scalable binarized neural network inference. February (2017). arxiv:arXiv:1612.07119v1Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.Google ScholarGoogle Scholar
  69. Huan Wang, Qiming Zhang, Yuehai Wang, and Roland Hu. 2018. Structured deep neural network pruning by varying regularization parameters. (2018). arxiv:1804.09461 http://arxiv.org/abs/1804.09461Google ScholarGoogle Scholar
  70. Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. CoRR abs/1804.03209 (2018). arxiv:1804.03209 http://arxiv.org/abs/1804.03209Google ScholarGoogle Scholar
  71. Haojin Yang, Martin Fritzsche, Christian Bartz, and Christoph Meinel. 2017. BMXNet: An open-source binary neural network implementation based on MXNet. (2017). arxiv:arXiv:1705.09864Google ScholarGoogle Scholar
  72. Li Yang, Zhezhi He, and Deliang Fan. 2018. A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference. In Proceedings of the International Symposium on Low Power Electronics and Design. 50:1--50:6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Haruyoshi Yonekawa, Shimpei Sato, and Hiroki Nakahara. 2018. A ternary weight binary input convolutional neural network: Realization on the embedded processor. In Proceedings of the 2018 IEEE 48th International Symposium on Multiple-Valued Logic (ISMVL) (2018), 174--179. DOI:https://doi.org/10.1109/ISMVL.2018.00038Google ScholarGoogle ScholarCross RefCross Ref
  74. Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture (2017), 548--560. DOI:https://doi.org/10.1145/3079856.3080215Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO 2016-Decem (2016). DOI:https://doi.org/10.1109/MICRO.2016.7783723Google ScholarGoogle ScholarCross RefCross Ref
  76. Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays—FPGA’17 (2017), 15--24. DOI:https://doi.org/10.1145/3020078.3021741Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. 1, 1 (2016), 1--13. arxiv:1606.06160 http://arxiv.org/abs/1606.06160Google ScholarGoogle Scholar
  78. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. (2016), 1--10. arxiv:1612.01064 http://arxiv.org/abs/1612.01064Google ScholarGoogle Scholar

Index Terms

  1. 3PXNet: Pruned-Permuted-Packed XNOR Networks for Edge Machine Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!