Abstract
This article proposes EncoDeep, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. EncoDeep incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execution flow of EncoDeep hardware engine is completely performed within the FPGA using on-chip streaming buffers with no access to the off-chip DRAM. We further propose a fully automated optimization algorithm that determines the flexible encoding bitwidths across network layers. EncoDeep full-stack framework comprises a compiler that takes a high-level Python description of an arbitrary neural network. The compiler then instantiates the corresponding elements from EncoDeep Hardware library for FPGA implementation. Our evaluations on MNIST, SVHN, and CIFAR-10 datasets demonstrate an average of 4.65× throughput improvement compared to stand-alone weight encoding. We further compare EncoDeep with six FPGA accelerators on ImageNet, showing an average of 3.6× and 2.54× improvement in throughput and performance-per-watt, respectively.
- 2018. CHaiDNN: HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs. https://github.com/Xilinx/CHaiDNN.Google Scholar
- Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. arXiv:1702.00953 (2017).Google Scholar
- Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, and Lintao Zhang. 2019. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In Proceedings of the International Symposium on Field-programmable Gate Arrays. ACM, 63--72.Google Scholar
Digital Library
- W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015. Compressing neural networks with the hashing trick. CoRR, abs/1504.04788 (2015).Google Scholar
- Ahmed T. Elthakeb, Prannoy Pilligundla, Amir Yazdanbakhsh, Sean Kinzer, and Hadi Esmaeilzadeh. 2018. Releq: A reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704 (2018).Google Scholar
- Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’19). IEEE, 1--6.Google Scholar
Cross Ref
- Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Towards collaborative intelligence friendly architectures for deep learning. In Proceedings of the 20th International Symposium on Quality Electronic Design (ISQED’19). IEEE, 14--19.Google Scholar
Cross Ref
- Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1--14.Google Scholar
Digital Library
- Joshua Fromm, Shwetak Patel, and Matthai Philipose. 2018. Heterogeneous bitwidth binarization in convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 4006--4015.Google Scholar
- M. Ghasemzadeh, M. Samragh, and F. Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 57--64.Google Scholar
- Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, and Hadi Esmaeilzadeh. 2019. Mixed-signal charge-domain acceleration of deep neural networks through interleaved bit-partitioned arithmetic. arXiv preprint arXiv:1906.11915 (2019).Google Scholar
- Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, and Hadi Esmaeilzadeh. 2020. Bit-parallel vector composability for neural acceleration. arXiv preprint arXiv:2004.05333 (2020).Google Scholar
- S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
- Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1135--1143.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
- Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018).Google Scholar
- Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision (ICCV’17), Vol. 2.Google Scholar
Cross Ref
- Ali HeydariGorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Alves, and Pai H. Chou. 2020. STANNIS: Low-power acceleration of deep NeuralNetwork training using computational storage. arXiv preprint arXiv:2002.07215 (2020).Google Scholar
- Morteza Hosseini, Mark Horton, Hiren Paneliya, Uttej Kallakuri, Houman Homayoun, and Tinoosh Mohsenin. 2019. On the complexity reduction of dense layers from O (N 2) to O (NlogN) with cyclic sparsely connected layers. In Proceedings of the Design Automation Conference. IEEE, 1--6.Google Scholar
Digital Library
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 1 (2017), 6869--6898.Google Scholar
Digital Library
- Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, and Farinaz Koushanfar. 2019. FastWave: Accelerating autoregressive convolutional neural networks on FPGA. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design (ICCAD’19). IEEE, 1--8.Google Scholar
Cross Ref
- Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, and Tajana Rosing. 2018. RAPIDNN: In-memory deep neural network acceleration framework. arXiv preprint arXiv:1806.05794 (2018).Google Scholar
- Mojan Javaheripi, Bita Darvish Rouhani, and Farinaz Koushanfar. 2019. SWNet: Small-world neural networks and rapid convergence. arXiv preprint arXiv:1904.04862 (2019).Google Scholar
- Mojan Javaheripi, Mohammad Samragh, Tara Javidi, and Farinaz Koushanfar. 2020. GeneCAI: Genetic evolution for acquiring compact AI. arXiv preprint arXiv:2004.04249 (2020).Google Scholar
- Mojan Javaheripi, Mohammad Samragh, and Farinaz Koushanfar. 2019. Peeking into the black box: A tutorial on automated design optimization and parameter search. IEEE Solid-state Circ. Mag. 11, 4 (2019), 23--28.Google Scholar
Cross Ref
- Chunhui Jiang, Guiying Li, Chao Qian, and Ke Tang. 2018. Efficient DNN neuron pruning by minimizing layer-wise nonlinear reconstruction error. In Proceedings of the International Joint Conference on Artificial Intelligence. 2--2.Google Scholar
Cross Ref
- J. Kiefer, J. Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23, 3 (1952), 462--466.Google Scholar
Cross Ref
- Y. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1097--1105.Google Scholar
Digital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google Scholar
- D. Li, X. Wang, and D. Kong. 2017. DeepRebirth: Accelerating deep neural network execution on mobile devices. arXiv preprint:1708.04728 (2017).Google Scholar
- Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv:1608.08710 (2016).Google Scholar
- Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2181--2191.Google Scholar
- Shaohui Lin, Rongrong Ji, Yuchao Li, Yongjian Wu, Feiyue Huang, and Baochang Zhang. 2018. Accelerating convolutional networks via global 8 dynamic filter pruning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2425--2432.Google Scholar
Cross Ref
- X. Lin, C. Zhao, and W. Pan. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 345--353.Google Scholar
- Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, and Y. Xu. 2017. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10, 3 (2017), 17.Google Scholar
Digital Library
- Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 17--25.Google Scholar
Cross Ref
- Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058--5066.Google Scholar
Cross Ref
- N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017).Google Scholar
- A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017).Google Scholar
- M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542.Google Scholar
- M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based oblivious deep neural network inference. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). 1501--1518.Google Scholar
- Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).Google Scholar
- M. Samragh, M. Ghasemzadeh, and F. Koushanfar. 2017. Customizing neural networks for efficient FPGA implementation. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. 85--92.Google Scholar
- M. Samragh, M. Imani, F. Koushanfar, and T. Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’17). IEEE, 1775--1780.Google Scholar
- Mohammad Samragh, Mojan Javaheripi, and Farinaz Koushanfar. 2019. AutoRank: Automated rank selection for effective neural network customization. In Proceedings of the ML-for-Systems Workshop at the 46th International Symposium on Computer Architecture (ISCA’19).Google Scholar
- H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 17.Google Scholar
- Colin Shea and Tinoosh Mohsenin. 2019. Heterogeneous scheduling of deep neural networks for low-power real-time designs. ACM J. Emerg. Technol. Comput. Syst. 15, 4 (2019), 1--31.Google Scholar
Digital Library
- C. Shea, A. Page, and T. Mohsenin. 2018. SCALENet: A SCalable low power AccELerator for real-time embedded deep neural networks. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 129--134.Google Scholar
- Y. Shen, M. Ferdman, and P. Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the International Symposium on Computer Architecture. 535--547.Google Scholar
- N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, and Y. Cao. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-programmable Gate Arrays. 16--25.Google Scholar
- Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the International Symposium on Field-programmable Gate Arrays. IEEE, 65--74.Google Scholar
- Erwei Wang, James J. Davis, Peter Y. K. Cheung, and George A. Constantinides. 2019. LUTNet: Rethinking inference in FPGA soft logic. In Proceedings of the IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’19). IEEE, 26--34.Google Scholar
- Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu. 2017. Structured probabilistic pruning for convolutional neural network acceleration. arXiv preprint arXiv:1709.06994 (2017).Google Scholar
- Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3--4 (1992), 279--292.Google Scholar
Digital Library
- W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2074--2082.Google Scholar
- T. Yang, A. Howard, B. Chen, X. Zhang, A. Go, V. Sze, and H. Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. arXiv preprint arXiv:1804.03230 (2018).Google Scholar
- C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the International Symposium on Low Power Electronics and Design. 326--331.Google Scholar
- S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google Scholar
- Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. 2018. Towards effective low-bitwidth convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7920--7928.Google Scholar
Cross Ref
Index Terms
EncoDeep: Realizing Bit-flexible Encoding for Deep Neural Networks
Recommendations
Morphogenic neural networks encode abstract rules by data
Special issue: Intelligent information systems and applicationsThe classical McCulloch and Pitts neural unit is widely used today in artificial neural networks (NNs) and essentially acts as a non-linear filter. Classical NN are only capable of approximating a mapping between inputs and outputs in the form of a ...
An Approach to Encode Multilayer Perceptrons
ICANN '02: Proceedings of the International Conference on Artificial Neural NetworksGenetic connectionism is based on the integration of evolution and neural network learning within one system. An overview of the Multilayer Perceptron encoding schemes is presented. A new approach is shown and tested on various case studies. The ...
Simple Strategies to Encode Tree Automata in Sigmoid Recursive Neural Networks
Recently, a number of authors have explored the use of recursive neural nets (RNN) for the adaptive processing of trees or tree-like structures. One of the most important language-theoretical formalizations of the processing of tree-structured data is ...






Comments