skip to main content
research-article

A BNN Accelerator Based on Edge-skip-calculation Strategy and Consolidation Compressed Tree

Authors Info & Claims
Published:10 May 2022Publication History
Skip Abstract Section

Abstract

Binarized neural networks (BNNs) and batch normalization (BN) have already become typical techniques in artificial intelligence today. Unfortunately, the massive accumulation and multiplication in BNN models bring challenges to field-programmable gate array (FPGA) implementations, because complex arithmetics in BN consume too much computing resources. To relax FPGA resource limitations and speed up the computing process, we propose a BNN accelerator architecture based on consolidation compressed tree scheme by combining both XNOR and accumulation operation of the low bit into a systematic one. During the compression process, we adopt 0-padding (not ±1) to achieve no-accuracy-loss from software modeling to hardware implementation. Moreover, we introduce shift-addition-BN free binarization technique to shorten the delay path and optimize on-chip storage. To sum up, we drastically cut down the hardware consumption while maintaining great speed performance with the same model complexity as the previous design. We evaluate our accelerator on MNIST and CIFAR-10 dataset and implement the whole system on the ARTIX-7 100T FPGA with speed performance of 2052.65 GOP/s and area efficiency of 70.15 GOPS/KLUT.

REFERENCES

  1. [1] Andri R., Cavigelli L., Rossi D., and Benini L.. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’16). 236241. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Collobert Ronan, Kavukcuoglu Koray, and Farabet Clément. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS workshop, Vol. 5. Granada, 10.Google ScholarGoogle Scholar
  3. [3] Courbariaux Matthieu and Bengio Yoshua. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \( -1 \). Retrieved from https://arXiv:1602.02830.Google ScholarGoogle Scholar
  4. [4] Dundar A., Jin J., Gokhale V., Martini B., and Culurciello E.. 2014. Memory access optimized routing scheme for deep networks on a mobile coprocessor. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’14). 16. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Fang H., Yuan S., Lin P., and Kuo S.. 2019. Tree structure network: A learning-based deep network for classification of cpu instruction through em signal. In Proceedings of the Joint International Symposium on Electromagnetic Compatibility, Sapporo and Asia-Pacific International Symposium on Electromagnetic Compatibility (EMC Sapporo/APEMC’19). 246249.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Fiscaletti G., Speziali M., Stornaiuolo L., Santambrogio M. D., and Sciuto D.. 2020. BNNsplit: Binarized neural networks for embedded distributed FPGA-based computing systems. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’20). 975978. Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Fu Cheng, Zhu Shilin, Su Hao, Lee Ching-En, and Zhao Jishen. 2019. Towards fast and energy-efficient binarized neural network inference on FPGA. In Proceedings of the Conference on Field Programmable Gate Arrays (FPGA’19). ACM, 306306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Geng T., Li A., Wang T., Wu C., Li Y., Shi R., Wu W., and Herbordt M.. 2021. O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference. IEEE Trans. Parallel Distrib. Syst. 32, 1 (2021), 199213. Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Guo P., Ma H., Chen R., Li P., Xie S., and Wang D.. 2018. FBNA: A fully binarized neural network accelerator. In Proceedings of the 28th International Conference on Field-programmable Logic and Applications (FPL’18). 51513. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Ioffe Sergey and Szegedy Christian. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Retrieved from https://arxiv.org/abs/1502.03167.Google ScholarGoogle Scholar
  11. [11] Jiang F., Dashtipour K., and Hussain A.. 2019. A survey on deep learning for the routing layer of computer network. In Proceedings of the UK/China Emerging Technologies (UCET’19). 14. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Junfei Y., Jingwen L., Bing S., and Yuming J.. 2018. Barrage jamming detection and classification based on convolutional neural network for synthetic aperture radar. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS’18). 45834586. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Lecun Y., Bottou L., Bengio Y., and Haffner P.. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 22782324. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Liang Shuang, Yin Shouyi, Liu Leibo, Luk Wayne, and Wei Shaojun. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 10721086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Lin J., Lotfi A., Akhlaghi V., Tu Z., and Gupta R. K.. 2019. Accelerating local binary pattern networks with software-programmable FPGAs. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’19). 11121117. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng-Yang, and Berg Alexander C.. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision – ECCV 2016. Lecture Notes in Computer Science, B. Leibe, J. Matas, N. Sebe, M. Welling, (Eds.). vol. 9905. Springer, Cham. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Liu Weibo, Wang Zidong, Liu Xiaohui, Zeng Nianyin, Liu Yurong, and Alsaadi Fuad E.. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 234 (2017), 1126. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Qiu Jiantao, Wang Jie, Yao Song, Guo Kaiyuan, Li Boxun, Zhou Erjin, Yu Jincheng, Tang Tianqi, Xu Ningyi, Song Sen, Wang Yu, and Yang Huazhong. 2016. Going deeper with embedded FPGA platform for convolutional neural network. Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’16). 2635.Google ScholarGoogle Scholar
  19. [19] Rastegari Mohammad, Ordonez Vicente, et al. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. Retrieved from https://arXiv:1603.05279.Google ScholarGoogle Scholar
  20. [20] Roth H. R., Lee C. T., Shin H., Seff A., Kim L., Yao J., Lu L., and Summers R. M.. 2015. Anatomy-specific classification of medical images using deep convolutional nets. In Proceedings of the IEEE 12th International Symposium on Biomedical Imaging (ISBI’15). 101104. Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Suda Naveen, Chandra Vikas, Dasika Ganesh, Mohanty Abinash, Ma Yufei, Vrudhula Sarma, Seo Jae-Sun, and Cao Yu. 2016. Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’16). 1625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Sun Y., Liu B., and Xu X.. 2019. An openCL-based hybrid CNN-RNN inference accelerator on FPGA. In Proceedings of the International Conference on Field-programmable Technology (ICFPT’19). 283286.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Ulagamuthalvi, Felicita J. B. J., and Abinaya D.. 2019. An efficient object detection model using convolution neural networks. In Proceedings of the 3rd International Conference on Trends in Electronics and Informatics (ICOEI’19). 142147. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Umuroglu Yaman, Fraser Nicholas J., Gambardella Giulio, Blott Michaela, Leong Philip Heng Wai, Jahre Magnus, and Vissers Kees A.. 2016. FINN: A framework for fast, scalable binarized neural network inference. Retrieved from https://arXiv:1612.07119.Google ScholarGoogle Scholar
  25. [25] Wang J., Jin X., and Wu W.. 2020. TB-DNN: A thin binarized deep neural network with high accuracy. In Proceedings of the 22nd International Conference on Advanced Communication Technology (ICACT’20). 419424.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Xie K., Chen Y., Wang X., Xie G., Cao J., Wen J., Yang G., and Sun J.. 2020. Accurate and fast recovery of network monitoring data with GPU-accelerated tensor completion. IEEE/ACM Trans. Netw. (2020), 114.Google ScholarGoogle Scholar
  27. [27] Yanai K. and Kawano Y.. 2015. Food image recognition using deep convolutional network with pre-training and fine-tuning. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops (ICMEW’15). 16. Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Yonekawa H. and Nakahara H.. 2017. On-Chip memory-based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). 98105. Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Zhao C., Wang P., Wang J., and Men Z.. 2019. A maritime target detector based on CNN and embedded device for GF-3 images. In Proceedings of the 6th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR’19). 14.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Zhao Ritchie, Song Weinan, Zhang Wentao, Xing Tianwei, Lin Jeng-Hau, Srivastava Mani, Gupta Rajesh, and Zhang Zhiru. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’17). 1524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Zhao S., An F., and Yu H.. 2019. A 307-fps 351.7-GOPs/W deep learning FPGA accelerator for real-time scene text recognition. In Proceedings of the International Conference on Field-programmable Technology (ICFPT’19). 263266.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A BNN Accelerator Based on Edge-skip-calculation Strategy and Consolidation Compressed Tree

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 3
        September 2022
        353 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3508070
        • Editor:
        • Deming Chen
        Issue’s Table of Contents

        ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 May 2022
        • Accepted: 1 October 2021
        • Revised: 1 September 2021
        • Received: 1 August 2020
        Published in trets Volume 15, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)294
        • Downloads (Last 6 weeks)32

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!