skip to main content
research-article
Open Access

EncoDeep: Realizing Bit-flexible Encoding for Deep Neural Networks

Published:29 September 2020Publication History
Skip Abstract Section

Abstract

This article proposes EncoDeep, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. EncoDeep incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execution flow of EncoDeep hardware engine is completely performed within the FPGA using on-chip streaming buffers with no access to the off-chip DRAM. We further propose a fully automated optimization algorithm that determines the flexible encoding bitwidths across network layers. EncoDeep full-stack framework comprises a compiler that takes a high-level Python description of an arbitrary neural network. The compiler then instantiates the corresponding elements from EncoDeep Hardware library for FPGA implementation. Our evaluations on MNIST, SVHN, and CIFAR-10 datasets demonstrate an average of 4.65× throughput improvement compared to stand-alone weight encoding. We further compare EncoDeep with six FPGA accelerators on ImageNet, showing an average of 3.6× and 2.54× improvement in throughput and performance-per-watt, respectively.

References

  1. 2018. CHaiDNN: HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs. https://github.com/Xilinx/CHaiDNN.Google ScholarGoogle Scholar
  2. Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. arXiv:1702.00953 (2017).Google ScholarGoogle Scholar
  3. Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, and Lintao Zhang. 2019. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In Proceedings of the International Symposium on Field-programmable Gate Arrays. ACM, 63--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015. Compressing neural networks with the hashing trick. CoRR, abs/1504.04788 (2015).Google ScholarGoogle Scholar
  5. Ahmed T. Elthakeb, Prannoy Pilligundla, Amir Yazdanbakhsh, Sean Kinzer, and Hadi Esmaeilzadeh. 2018. Releq: A reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704 (2018).Google ScholarGoogle Scholar
  6. Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’19). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  7. Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Towards collaborative intelligence friendly architectures for deep learning. In Proceedings of the 20th International Symposium on Quality Electronic Design (ISQED’19). IEEE, 14--19.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joshua Fromm, Shwetak Patel, and Matthai Philipose. 2018. Heterogeneous bitwidth binarization in convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 4006--4015.Google ScholarGoogle Scholar
  10. M. Ghasemzadeh, M. Samragh, and F. Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 57--64.Google ScholarGoogle Scholar
  11. Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, and Hadi Esmaeilzadeh. 2019. Mixed-signal charge-domain acceleration of deep neural networks through interleaved bit-partitioned arithmetic. arXiv preprint arXiv:1906.11915 (2019).Google ScholarGoogle Scholar
  12. Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, and Hadi Esmaeilzadeh. 2020. Bit-parallel vector composability for neural acceleration. arXiv preprint arXiv:2004.05333 (2020).Google ScholarGoogle Scholar
  13. S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  14. Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1135--1143.Google ScholarGoogle Scholar
  15. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle Scholar
  16. Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018).Google ScholarGoogle Scholar
  17. Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision (ICCV’17), Vol. 2.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ali HeydariGorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Alves, and Pai H. Chou. 2020. STANNIS: Low-power acceleration of deep NeuralNetwork training using computational storage. arXiv preprint arXiv:2002.07215 (2020).Google ScholarGoogle Scholar
  19. Morteza Hosseini, Mark Horton, Hiren Paneliya, Uttej Kallakuri, Houman Homayoun, and Tinoosh Mohsenin. 2019. On the complexity reduction of dense layers from O (N 2) to O (NlogN) with cyclic sparsely connected layers. In Proceedings of the Design Automation Conference. IEEE, 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  21. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 1 (2017), 6869--6898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, and Farinaz Koushanfar. 2019. FastWave: Accelerating autoregressive convolutional neural networks on FPGA. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design (ICCAD’19). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, and Tajana Rosing. 2018. RAPIDNN: In-memory deep neural network acceleration framework. arXiv preprint arXiv:1806.05794 (2018).Google ScholarGoogle Scholar
  24. Mojan Javaheripi, Bita Darvish Rouhani, and Farinaz Koushanfar. 2019. SWNet: Small-world neural networks and rapid convergence. arXiv preprint arXiv:1904.04862 (2019).Google ScholarGoogle Scholar
  25. Mojan Javaheripi, Mohammad Samragh, Tara Javidi, and Farinaz Koushanfar. 2020. GeneCAI: Genetic evolution for acquiring compact AI. arXiv preprint arXiv:2004.04249 (2020).Google ScholarGoogle Scholar
  26. Mojan Javaheripi, Mohammad Samragh, and Farinaz Koushanfar. 2019. Peeking into the black box: A tutorial on automated design optimization and parameter search. IEEE Solid-state Circ. Mag. 11, 4 (2019), 23--28.Google ScholarGoogle ScholarCross RefCross Ref
  27. Chunhui Jiang, Guiying Li, Chao Qian, and Ke Tang. 2018. Efficient DNN neuron pruning by minimizing layer-wise nonlinear reconstruction error. In Proceedings of the International Joint Conference on Artificial Intelligence. 2--2.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Kiefer, J. Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23, 3 (1952), 462--466.Google ScholarGoogle ScholarCross RefCross Ref
  29. Y. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).Google ScholarGoogle Scholar
  30. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  32. D. Li, X. Wang, and D. Kong. 2017. DeepRebirth: Accelerating deep neural network execution on mobile devices. arXiv preprint:1708.04728 (2017).Google ScholarGoogle Scholar
  33. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv:1608.08710 (2016).Google ScholarGoogle Scholar
  34. Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2181--2191.Google ScholarGoogle Scholar
  35. Shaohui Lin, Rongrong Ji, Yuchao Li, Yongjian Wu, Feiyue Huang, and Baochang Zhang. 2018. Accelerating convolutional networks via global 8 dynamic filter pruning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2425--2432.Google ScholarGoogle ScholarCross RefCross Ref
  36. X. Lin, C. Zhao, and W. Pan. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 345--353.Google ScholarGoogle Scholar
  37. Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, and Y. Xu. 2017. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10, 3 (2017), 17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 17--25.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058--5066.Google ScholarGoogle ScholarCross RefCross Ref
  40. N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017).Google ScholarGoogle Scholar
  41. A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017).Google ScholarGoogle Scholar
  42. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542.Google ScholarGoogle Scholar
  43. M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based oblivious deep neural network inference. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). 1501--1518.Google ScholarGoogle Scholar
  44. Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google ScholarGoogle Scholar
  45. M. Samragh, M. Ghasemzadeh, and F. Koushanfar. 2017. Customizing neural networks for efficient FPGA implementation. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. 85--92.Google ScholarGoogle Scholar
  46. M. Samragh, M. Imani, F. Koushanfar, and T. Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’17). IEEE, 1775--1780.Google ScholarGoogle Scholar
  47. Mohammad Samragh, Mojan Javaheripi, and Farinaz Koushanfar. 2019. AutoRank: Automated rank selection for effective neural network customization. In Proceedings of the ML-for-Systems Workshop at the 46th International Symposium on Computer Architecture (ISCA’19).Google ScholarGoogle Scholar
  48. H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 17.Google ScholarGoogle Scholar
  49. Colin Shea and Tinoosh Mohsenin. 2019. Heterogeneous scheduling of deep neural networks for low-power real-time designs. ACM J. Emerg. Technol. Comput. Syst. 15, 4 (2019), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. C. Shea, A. Page, and T. Mohsenin. 2018. SCALENet: A SCalable low power AccELerator for real-time embedded deep neural networks. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 129--134.Google ScholarGoogle Scholar
  51. Y. Shen, M. Ferdman, and P. Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the International Symposium on Computer Architecture. 535--547.Google ScholarGoogle Scholar
  52. N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, and Y. Cao. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-programmable Gate Arrays. 16--25.Google ScholarGoogle Scholar
  53. Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the International Symposium on Field-programmable Gate Arrays. IEEE, 65--74.Google ScholarGoogle Scholar
  54. Erwei Wang, James J. Davis, Peter Y. K. Cheung, and George A. Constantinides. 2019. LUTNet: Rethinking inference in FPGA soft logic. In Proceedings of the IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’19). IEEE, 26--34.Google ScholarGoogle Scholar
  55. Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu. 2017. Structured probabilistic pruning for convolutional neural network acceleration. arXiv preprint arXiv:1709.06994 (2017).Google ScholarGoogle Scholar
  56. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3--4 (1992), 279--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2074--2082.Google ScholarGoogle Scholar
  58. T. Yang, A. Howard, B. Chen, X. Zhang, A. Go, V. Sze, and H. Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. arXiv preprint arXiv:1804.03230 (2018).Google ScholarGoogle Scholar
  59. C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the International Symposium on Low Power Electronics and Design. 326--331.Google ScholarGoogle Scholar
  60. S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google ScholarGoogle Scholar
  61. Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. 2018. Towards effective low-bitwidth convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7920--7928.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. EncoDeep: Realizing Bit-flexible Encoding for Deep Neural Networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!