skip to main content
research-article

Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGA

Published:10 May 2023Publication History
Skip Abstract Section

Abstract

Convolutional neural network (CNN)-based object detection has achieved very high accuracy; e.g., single-shot multi-box detectors (SSDs) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this article, we designed and co-optimized an algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained an SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, and throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC, and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy efficiency of 79 GOPS/W and throughput of 158 GOPS using the Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1 to 8.7× higher energy efficiency than prior works that used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For the COCO dataset, our MobileNet-V1 implementation achieved an mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9× higher than prior FPGA works or other commercial hardware platforms.

REFERENCES

  1. [1] Shehab Mohanad Abd, Al-Gizi Ammar, and Swadi Salah M.. 2021. Efficient real-time object detection based on convolutional neural network. In 2021 International Conference on Applied and Theoretical Electricity (ICATE’21). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chiu Yu-Chen, Tsai Chi-Yi, Ruan Mind-Da, Shen Guan-Yu, and Lee Tsu-Tian. 2020. Mobilenet-SSDv2: An improved object detection model for embedded systems. In 2020 International Conference on System Science and Engineering (ICSSE’20). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Choi Jungwook, Venkataramani Swagath, Srinivasan Vijayalakshmi, Gopalakrishnan Kailash, Wang Zhuo, and Chuang Pierce. 2019. Accurate and efficient 2-bit quantized neural networks. In Conference on Machine Learning and Systems (MLSys’19).Google ScholarGoogle Scholar
  4. [4] Ding Xiaohan, Zhang Xiangyu, Ma Ningning, Han Jungong, Ding Guiguang, and Sun Jian. 2021. RepVGG: Making VGG-Style ConvNets great again. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 1373313742.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Fang Wei, Wang Lin, and Ren Peiming. 2019. Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access 8 (2019), 19351944.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  7. [7] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Kalali Ercan and Leuken Rene Van. 2021. Near-precise parameter approximation for multiple multiplications on a single DSP block. IEEE Trans. Comput. 71, 9 (2021), 2036–2047.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Krishnamoorthi Raghuraman. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).Google ScholarGoogle Scholar
  10. [10] Li Fanrong, Mo Zitao, Wang Peisong, Liu Zejian, Zhang Jiayun, Li Gang, Hu Qinghao, He Xiangyu, Leng Cong, Zhang Yang, and Cheng Jian. 2019. A system-level solution for low-power object detection. In IEEE/CVF International Conference on Computer Vision Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Li Yuhang, Dong Xin, and Wang Wei. 2019. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  12. [12] Lin Mingbao, Ji Rongrong, Xu Zihan, Zhang Baochang, Wang Yan, Wu Yongjian, Huang Feiyue, and Lin Chia-Wen. 2020. Rotated binary neural network. Advances in Neural Information Processing Systems 33 (2020), 7474–7485.Google ScholarGoogle Scholar
  13. [13] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng-Yang, and Berg Alexander C.. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision (ECCV’16). 2137.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Ma Yufei, Cao Yu, Vrudhula Sarma, and Seo Jaesun. 2018. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 7 (2018), 13541367.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Ma Yufei, Zheng Tu, Cao Yu, Vrudhula Sarma, and Seo Jaesun. 2018. Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Nguyen Dong, Kim Daewoo, and Lee Jongeun. 2017. Double MAC: Doubling the performance of convolutional neural networks on modern FPGAs. In Design, Automation & Test in Europe Conference & Exhibition (DATE’17). IEEE, 890893.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Nguyen Ty, Miller Ian D., Cohen Avi, Thakur Dinesh, Guru Arjun, Prasad Shashank, Taylor Camillo J., Chaudhari Pratik, and Kumar Vijay. 2021. PennSyn2Real: Training object recognition models without human labeling. IEEE Robotics and Automation Letters 6, 3 (2021), 50325039.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Park Eunhyeok and Yoo Sungjoo. 2020. PROFIT: A novel training method for sub-4-bit MobileNet models. In European Conference on Computer Vision (ECCV’20). 430446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Qin Zheng, Li Zeming, Zhang Zhaoning, Bao Yiping, Yu Gang, Peng Yuxing, and Sun Jian. 2019. ThunderNet: Towards real-time generic object detection on mobile devices. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 67186727.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Quigley Morgan, Mohta Kartik, Shivakumar Shreyas S., Watterson Michael, Mulgaonkar Yash, Arguedas Mikael, Sun Ke, Liu Sikang, Pfrommer Bernd, Kumar Vijay, and Taylor Camillo J.. 2018. The open vision computer: An integrated sensing and compute system for mobile robots. CoRR abs/1809.07674 (2018). arxiv:1809.07674 http://arxiv.org/abs/1809.07674.Google ScholarGoogle Scholar
  21. [21] Sze Vivienne, Chen Yu-Hsin, Yang Tien-Ju, and Emer Joel S.. 2020. How to evaluate deep neural network processors: TOPS/W (alone) considered harmful. IEEE Solid-State Circuits Magazine 12, 3 (2020), 2841. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Wu Di, Zhang Yu, Jia Xijie, Tian Lu, Li Tianping, Sui Lingzhi, Xie Dongliang, and Shan Yi. 2019. A high-performance CNN processor based on FPGA for MobileNets. In IEEE International Conference on Field Programmable Logic and Applications (FPL’19). 136143.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Xilinx. 2020. Product Guide of DPU (Deep Learning Processing Unit) IP. https://www.xilinx.com/support/documentation/ip_documentation/dpu/v3_2/pg338-dpu.pdf.Google ScholarGoogle Scholar
  24. [24] Xu Xiaowei, Zhang Xinyi, Yu Bei, Hu X. Sharon, Rowen Christopher, Hu Jingtong, and Shi Yiyu. 2021. DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 2 (2021), 392403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Zhang Xiaofan, Hao Cong, Lu Haoming, Li Jiachen, Li Yuhong, Fan Yuchen, Rupnow Kyle, Xiong Jinjun, Huang Thomas, Shi Honghui, et al. 2019. Skynet: A champion model for DAC-SDC on low power object detection. arXiv preprint arXiv:1906.10327 (2019).Google ScholarGoogle Scholar
  26. [26] Zhao Yiren, Gao Xitong, Guo Xuan, Liu Junyi, Wang Erwei, Mullins Robert, Cheung Peter Y. K., Constantinides George, and Xu Cheng-Zhong. 2019. Automatic generation of multi-precision multi-arithmetic CNN accelerators for FPGAs. In 2019 International Conference on Field-Programmable Technology (ICFPT’19). IEEE, 4553.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGA

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 16, Issue 2
        June 2023
        451 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3587031
        • Editor:
        • Deming Chen
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 May 2023
        • Online AM: 15 February 2023
        • Accepted: 27 January 2023
        • Revised: 3 December 2022
        • Received: 29 May 2022
        Published in trets Volume 16, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)126
        • Downloads (Last 6 weeks)33

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!