Abstract
Binarized neural networks (BNNs) and batch normalization (BN) have already become typical techniques in artificial intelligence today. Unfortunately, the massive accumulation and multiplication in BNN models bring challenges to field-programmable gate array (FPGA) implementations, because complex arithmetics in BN consume too much computing resources. To relax FPGA resource limitations and speed up the computing process, we propose a BNN accelerator architecture based on consolidation compressed tree scheme by combining both XNOR and accumulation operation of the low bit into a systematic one. During the compression process, we adopt 0-padding (not ±1) to achieve no-accuracy-loss from software modeling to hardware implementation. Moreover, we introduce shift-addition-BN free binarization technique to shorten the delay path and optimize on-chip storage. To sum up, we drastically cut down the hardware consumption while maintaining great speed performance with the same model complexity as the previous design. We evaluate our accelerator on MNIST and CIFAR-10 dataset and implement the whole system on the ARTIX-7 100T FPGA with speed performance of 2052.65 GOP/s and area efficiency of 70.15 GOPS/KLUT.
- [1] . 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’16). 236–241. Google Scholar
Cross Ref
- [2] . 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS workshop, Vol. 5. Granada, 10.Google Scholar
- [3] . 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \( -1 \). Retrieved from https://arXiv:1602.02830.Google Scholar
- [4] . 2014. Memory access optimized routing scheme for deep networks on a mobile coprocessor. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’14). 1–6. Google Scholar
Cross Ref
- [5] . 2019. Tree structure network: A learning-based deep network for classification of cpu instruction through em signal. In Proceedings of the Joint International Symposium on Electromagnetic Compatibility, Sapporo and Asia-Pacific International Symposium on Electromagnetic Compatibility (EMC Sapporo/APEMC’19). 246–249.Google Scholar
Cross Ref
- [6] . 2020. BNNsplit: Binarized neural networks for embedded distributed FPGA-based computing systems. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’20). 975–978. Google Scholar
Cross Ref
- [7] . 2019. Towards fast and energy-efficient binarized neural network inference on FPGA. In Proceedings of the Conference on Field Programmable Gate Arrays (FPGA’19). ACM, 306–306. Google Scholar
Digital Library
- [8] . 2021. O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference. IEEE Trans. Parallel Distrib. Syst. 32, 1 (2021), 199–213. Google Scholar
Cross Ref
- [9] . 2018. FBNA: A fully binarized neural network accelerator. In Proceedings of the 28th International Conference on Field-programmable Logic and Applications (FPL’18). 51–513. Google Scholar
Cross Ref
- [10] . 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Retrieved from https://arxiv.org/abs/1502.03167.Google Scholar
- [11] . 2019. A survey on deep learning for the routing layer of computer network. In Proceedings of the UK/China Emerging Technologies (UCET’19). 1–4. Google Scholar
Cross Ref
- [12] . 2018. Barrage jamming detection and classification based on convolutional neural network for synthetic aperture radar. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS’18). 4583–4586. Google Scholar
Cross Ref
- [13] . 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (
Nov. 1998), 2278–2324. Google ScholarCross Ref
- [14] . 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072–1086. Google Scholar
Digital Library
- [15] . 2019. Accelerating local binary pattern networks with software-programmable FPGAs. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’19). 1112–1117. Google Scholar
Cross Ref
- [16] . 2016. SSD: Single Shot MultiBox Detector. In Computer Vision – ECCV 2016. Lecture Notes in Computer Science, B. Leibe, J. Matas, N. Sebe, M. Welling, (Eds.). vol. 9905. Springer, Cham. Google Scholar
Cross Ref
- [17] . 2017. A survey of deep neural network architectures and their applications. Neurocomputing 234 (2017), 11–26. Google Scholar
Cross Ref
- [18] . 2016. Going deeper with embedded FPGA platform for convolutional neural network. Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’16). 26–35.Google Scholar
- [19] . 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. Retrieved from https://arXiv:1603.05279.Google Scholar
- [20] . 2015. Anatomy-specific classification of medical images using deep convolutional nets. In Proceedings of the IEEE 12th International Symposium on Biomedical Imaging (ISBI’15). 101–104. Google Scholar
Cross Ref
- [21] . 2016. Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’16). 16–25.Google Scholar
Digital Library
- [22] . 2019. An openCL-based hybrid CNN-RNN inference accelerator on FPGA. In Proceedings of the International Conference on Field-programmable Technology (ICFPT’19). 283–286.Google Scholar
Cross Ref
- [23] . 2019. An efficient object detection model using convolution neural networks. In Proceedings of the 3rd International Conference on Trends in Electronics and Informatics (ICOEI’19). 142–147. Google Scholar
Cross Ref
- [24] . 2016. FINN: A framework for fast, scalable binarized neural network inference. Retrieved from https://arXiv:1612.07119.Google Scholar
- [25] . 2020. TB-DNN: A thin binarized deep neural network with high accuracy. In Proceedings of the 22nd International Conference on Advanced Communication Technology (ICACT’20). 419–424.Google Scholar
Cross Ref
- [26] . 2020. Accurate and fast recovery of network monitoring data with GPU-accelerated tensor completion. IEEE/ACM Trans. Netw. (2020), 1–14.Google Scholar
- [27] . 2015. Food image recognition using deep convolutional network with pre-training and fine-tuning. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops (ICMEW’15). 1–6. Google Scholar
Cross Ref
- [28] . 2017. On-Chip memory-based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). 98–105. Google Scholar
Cross Ref
- [29] . 2019. A maritime target detector based on CNN and embedded device for GF-3 images. In Proceedings of the 6th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR’19). 1–4.Google Scholar
Cross Ref
- [30] . 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’17). 15–24.Google Scholar
Digital Library
- [31] . 2019. A 307-fps 351.7-GOPs/W deep learning FPGA accelerator for real-time scene text recognition. In Proceedings of the International Conference on Field-programmable Technology (ICFPT’19). 263–266.Google Scholar
Cross Ref
Index Terms
A BNN Accelerator Based on Edge-skip-calculation Strategy and Consolidation Compressed Tree
Recommendations
Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network
Highlights- Adaptive spatial amplitude model is propose to reduce complexity of BNN accelerator.
AbstractBinarized neural networks (BNNs) architecture play a vital role in the development of deep learning accelerator for memory-constrained IoT devices. However, the cost-efficiency of the domain-specific accelerators still requires a ...
Graphical abstractDisplay Omitted
Hardware Accelerator Design Based on Rough Set Philosophy
Rough Sets and Knowledge TechnologyAbstractThis paper presents a design of hardware accelerator for algorithms of rough set theory. A hardware implementation of incremental reduct generation and rule induction is proposed in this paper. Incremental reduct generation algorithm is based on ...






Comments