Abstract
Convolutional neural network (CNN)-based object detection has achieved very high accuracy; e.g., single-shot multi-box detectors (SSDs) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this article, we designed and co-optimized an algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained an SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, and throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC, and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy efficiency of 79 GOPS/W and throughput of 158 GOPS using the Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1 to 8.7× higher energy efficiency than prior works that used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For the COCO dataset, our MobileNet-V1 implementation achieved an mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9× higher than prior FPGA works or other commercial hardware platforms.
- [1] . 2021. Efficient real-time object detection based on convolutional neural network. In 2021 International Conference on Applied and Theoretical Electricity (ICATE’21). IEEE, 1–5.Google Scholar
Cross Ref
- [2] . 2020. Mobilenet-SSDv2: An improved object detection model for embedded systems. In 2020 International Conference on System Science and Engineering (ICSSE’20). IEEE, 1–5.Google Scholar
Cross Ref
- [3] . 2019. Accurate and efficient 2-bit quantized neural networks. In Conference on Machine Learning and Systems (MLSys’19).Google Scholar
- [4] . 2021. RepVGG: Making VGG-Style ConvNets great again. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 13733–13742.Google Scholar
Cross Ref
- [5] . 2019. Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access 8 (2019), 1935–1944.Google Scholar
Cross Ref
- [6] . 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- [7] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 2704–2713.Google Scholar
Cross Ref
- [8] . 2021. Near-precise parameter approximation for multiple multiplications on a single DSP block. IEEE Trans. Comput. 71, 9 (2021), 2036–2047.Google Scholar
Cross Ref
- [9] . 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).Google Scholar
- [10] . 2019. A system-level solution for low-power object detection. In IEEE/CVF International Conference on Computer Vision Workshops.Google Scholar
Cross Ref
- [11] . 2019. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In International Conference on Learning Representations (ICLR’19).Google Scholar
- [12] . 2020. Rotated binary neural network. Advances in Neural Information Processing Systems 33 (2020), 7474–7485.Google Scholar
- [13] . 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision (ECCV’16). 21–37.Google Scholar
Cross Ref
- [14] . 2018. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 7 (2018), 1354–1367.Google Scholar
Digital Library
- [15] . 2018. Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8.Google Scholar
Digital Library
- [16] . 2017. Double MAC: Doubling the performance of convolutional neural networks on modern FPGAs. In Design, Automation & Test in Europe Conference & Exhibition (DATE’17). IEEE, 890–893.Google Scholar
Cross Ref
- [17] . 2021. PennSyn2Real: Training object recognition models without human labeling. IEEE Robotics and Automation Letters 6, 3 (2021), 5032–5039.Google Scholar
Cross Ref
- [18] . 2020. PROFIT: A novel training method for sub-4-bit MobileNet models. In European Conference on Computer Vision (ECCV’20). 430–446.Google Scholar
Digital Library
- [19] . 2019. ThunderNet: Towards real-time generic object detection on mobile devices. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6718–6727.Google Scholar
Cross Ref
- [20] . 2018. The open vision computer: An integrated sensing and compute system for mobile robots. CoRR abs/1809.07674 (2018).
arxiv:1809.07674 http://arxiv.org/abs/1809.07674.Google Scholar - [21] . 2020. How to evaluate deep neural network processors: TOPS/W (alone) considered harmful. IEEE Solid-State Circuits Magazine 12, 3 (2020), 28–41.
DOI: Google ScholarCross Ref
- [22] . 2019. A high-performance CNN processor based on FPGA for MobileNets. In IEEE International Conference on Field Programmable Logic and Applications (FPL’19). 136–143.Google Scholar
Cross Ref
- [23] . 2020. Product Guide of DPU (Deep Learning Processing Unit) IP. https://www.xilinx.com/support/documentation/ip_documentation/dpu/v3_2/pg338-dpu.pdf.Google Scholar
- [24] . 2021. DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 2 (2021), 392–403.Google Scholar
Digital Library
- [25] . 2019. Skynet: A champion model for DAC-SDC on low power object detection. arXiv preprint arXiv:1906.10327 (2019).Google Scholar
- [26] . 2019. Automatic generation of multi-precision multi-arithmetic CNN accelerators for FPGAs. In 2019 International Conference on Field-Programmable Technology (ICFPT’19). IEEE, 45–53.Google Scholar
Cross Ref
Index Terms
Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGA
Recommendations
Efficient FPGA hardware development: A multi-language approach
This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of ...
High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection
Pattern Recognition and Computer VisionAbstractThe Field Programmable Gate Array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the ...
An FPGA implementation for neural networks with the FDFM processor core approach
This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...






Comments