skip to main content
research-article

An Efficient CNN Accelerator for Low-Cost Edge Systems

Published:23 August 2022Publication History
Skip Abstract Section

Abstract

Customized hardware based convolutional neural network (CNN or ConvNet) accelerators have attracted significant attention for applications in a low-cost, edge computing system. However, there is a lack of research that seeks to optimize at both the algorithm and hardware levels simultaneously in resource-constrained FPGA systems. In this paper, we first analyze ConvNet models to find one that is most suitable for a low-cost FPGA implementation. Based on the analysis, we select MobileNetV2 as the backbone of our research due to its hardware-friendly structure. We use a quantized implementation with 4-bit precision and optimize further with a smaller input resolution of 192 × 192 to obtain a 68.8% detection accuracy on ImageNet, which represents only a 3.2% accuracy loss compared to a floating-point model that uses the full input size. We then develop a hardware implementation that uses a low-cost FPGA. To accelerate the depth-wise separable ConvNet and utilize DRAM resources efficiently with parallel processing, we propose a novel scoreboard architecture to dynamically schedule DRAM data requests in order to maintain a high hardware utilization. The number of DSP blocks used is about six times smaller than in prior work. In addition, internal block RAM utilization is approximately nine times more efficient than in prior work. Our proposed design achieves 3.07 frames per second (FPS) on the low-cost and resource constrained FPGA system.

REFERENCES

  1. [1] ASUS. 2019. Tinker Edge R. (2019). Retrieved Feb. 2, 2022 from https://tinker-board.asus.com/product/tinker-edge-r.html.Google ScholarGoogle Scholar
  2. [2] Baharani Mohammadreza, Sunil Ushma, Manohar Kaustubh, Furgurson Steven, and Tabkhi Hamed. 2021. DeepDive: An integrative algorithm/architecture co-design for deep separable convolutional neural networks. In Proceedings of the 2021 on Great Lakes Symposium on VLSI. 247252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Bai Kunlun. 2019. A Comprehensive Introduction to Different Types of Convolutions in Deep Learning. (2019). Retrieved Feb. 11, 2019 from https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215.Google ScholarGoogle Scholar
  4. [4] Bai Lin, Zhao Yiming, and Huang Xinming. 2018. A CNN accelerator on FPGA using depthwise separable convolution. IEEE Transactions on Circuits and Systems II: Express Briefs 65, 10 (2018), 14151419.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Baller Stephan Patrick, Jindal Anshul, Chadha Mohak, and Gerndt Michael. 2021. DeepEdgeBench: Benchmarking deep neural networks on edge devices. In 2021 IEEE International Conference on Cloud Engineering (IC2E). 2030.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Banner Ron, Nahshan Yury, and Soudry Daniel. 2019. Post Training 4-Bit Quantization of Convolutional Networks for Rapid-Deployment. Curran Associates Inc., Red Hook, NY, USA.Google ScholarGoogle Scholar
  7. [7] Cai Liang, Dong Feng, Chen Ke, Yu Kehua, Qu Wei, and Jiang Jianfei. 2020. An FPGA based heterogeneous accelerator for single shot multibox detector (SSD). In 2020 IEEE 15th International Conference on Solid-State Integrated Circuit Technology (ICSICT). 13.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Choi Jungwook, Venkataramani Swagath, Srinivasan Vijayalakshmi (Viji), Gopalakrishnan Kailash, Wang Zhuo, and Chuang Pierce. 2019. Accurate and efficient 2-bit quantized neural networks. In Proceedings of Machine Learning and Systems 1, Talwalkar A., Smith V., and Zaharia M. (Eds.). 348359. https://proceedings.mlsys.org/paper/2019/file/006f52e9102a8d3be2fe5614f42ba989-Paper.pdf.Google ScholarGoogle Scholar
  9. [9] Courbariaux Matthieu and Bengio Yoshua. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830 (2016). arxiv:1602.02830 http://arxiv.org/abs/1602.02830.Google ScholarGoogle Scholar
  10. [10] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248255. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Gholami Amir, Kwon Kiseok, Wu Bichen, Tai Zizheng, Yue Xiangyu, Jin Peter H., Zhao Sicheng, and Keutzer Kurt. 2018. SqueezeNext: Hardware-aware neural network design. CoRR abs/1803.10615 (2018). arxiv:1803.10615 http://arxiv.org/abs/1803.10615.Google ScholarGoogle Scholar
  12. [12] Google. 2020. Coral Dev Board. (2020). Retrieved Feb. 2, 2022 from https://coral.ai/products/dev-board/.Google ScholarGoogle Scholar
  13. [13] Han Kai, Wang Yunhe, Tian Qi, Guo Jianyuan, Xu Chunjing, and Xu Chang. 2019. GhostNet: More features from cheap operations. CoRR abs/1911.11907 (2019). arxiv:1911.11907 http://arxiv.org/abs/1911.11907.Google ScholarGoogle Scholar
  14. [14] Hao Cong, Zhang Xiaofan, Li Yuhong, Huang Sitao, Xiong Jinjun, Rupnow Kyle, Hwu Wen-mei, and Chen Deming. 2019. FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770778.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] He Yihui, Zhang Xiangyu, and Sun Jian. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  17. [17] Howard Andrew, Sandler Mark, Chen Bo, Wang Weijun, Chen Liang-Chieh, Tan Mingxing, Chu Grace, Vasudevan Vijay, Zhu Yukun, Pang Ruoming, Adam Hartwig, and Le Quoc. 2019. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 13141324.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017). arxiv:1704.04861 http://arxiv.org/abs/1704.04861.Google ScholarGoogle Scholar
  19. [19] Jung Sangil, Son Changyong, Lee Seohyung, Son Jinwoo, Han Jae-Joon, Kwak Youngjun, Hwang Sung Ju, and Choi Changkyu. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 43454354.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Knapheide Justin, Stabernack Benno, and Kuhnke Maximilian. 2020. A high throughput MobileNetV2 FPGA implementation based on a flexible architecture for depthwise separable convolution. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). 277283.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Ko Byung Soo. 2018. ImageNet Classification Leaderboard. (2018). Retrieved May 27, 2021 from https://kobiso.github.io/Computer-Vision-Leaderboard/imagenet.html.Google ScholarGoogle Scholar
  22. [22] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Liao Jiawen, Cai Liangwei, Xu Yuan, and He Minya. 2019. Design of accelerator for MobileNet convolutional neural network based on FPGA. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 1. 13921396.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Luo Jian-Hao and Wu Jianxin. 2017. An entropy-based pruning method for CNN Compression. CoRR abs/1706.05791 (2017). arxiv:1706.05791 http://arxiv.org/abs/1706.05791.Google ScholarGoogle Scholar
  25. [25] Ma Ningning, Zhang Xiangyu, Zheng Hai-Tao, and Sun Jian. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. CoRR abs/1807.11164 (2018). arxiv:1807.11164 http://arxiv.org/abs/1807.11164.Google ScholarGoogle Scholar
  26. [26] Mehta Sachin, Hajishirzi Hannaneh, and Rastegari Mohammad. 2019. DiCENet: Dimension-wise convolutions for efficient networks. CoRR abs/1906.03516 (2019). arxiv:1906.03516 http://arxiv.org/abs/1906.03516.Google ScholarGoogle Scholar
  27. [27] Mei Chunsheng, Liu Zhenyu, Niu Yue, Ji Xiangyang, Zhou Wei, and Wang Dongsheng. 2017. A 200MHZ [email protected] VGG16 accelerator in Xilinx VX690T. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). 784788.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] NVIDIA. 2019. Jetson Nano. (2019). Retrieved Feb. 2, 2022 from https://developer.nvidia.com/embedded/jetson-nano-developer-kit.Google ScholarGoogle Scholar
  29. [29] Park Eunhyeok and Yoo Sungjoo. 2020. Profit: A novel training method for sub-4-bit mobilenet models. In European Conference on Computer Vision. Springer, 430446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Pi Rasberry. 2019. Raspberry Pi 4 Model B specifications. (2019). Retrieved Feb. 2, 2022 from https://www.raspberrypi.com/products/raspberry-pi-4-model-b/.Google ScholarGoogle Scholar
  31. [31] Podili Abhinav, Zhang Chi, and Prasanna Viktor. 2017. Fast and efficient implementation of convolutional neural networks on FPGA. In 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 1118.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Simonyan Karen and Zisserman Andrew. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. (2015). arxiv:cs.CV/1409.1556Google ScholarGoogle Scholar
  34. [34] Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Sandler Mark, Howard Andrew, and Le Quoc V.. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. (2019). arxiv:cs.CV/1807.11626Google ScholarGoogle Scholar
  35. [35] Tan Mingxing and Le Quoc V.. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. CoRR abs/1905.11946 (2019). arxiv:1905.11946 http://arxiv.org/abs/1905.11946.Google ScholarGoogle Scholar
  36. [36] Tu Cheng-Hao, Lee Jia-Hong, Chan Yi-Ming, and Chen Chu-Song. 2020. Pruning depthwise separable convolutions for mobilenet compression. In 2020 International Joint Conference on Neural Networks (IJCNN). 18.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Wang Xuan, Wang Chao, Cao Jing, Gong Lei, and Zhou Xuehai. 2020. WinoNN: Optimizing FPGA-based convolutional neural network accelerators using sparse Winograd algorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 42904302.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, and Keutzer Kurt. 2018. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. CoRR abs/1812.03443 (2018). arxiv:1812.03443 http://arxiv.org/abs/1812.03443.Google ScholarGoogle Scholar
  39. [39] Wu Bichen, Wan Alvin, Yue Xiangyu, Jin Peter H., Zhao Sicheng, Golmant Noah, Gholaminejad Amir, Gonzalez Joseph, and Keutzer Kurt. 2017. Shift: A zero FLOP, zero parameter alternative to spatial convolutions. CoRR abs/1711.08141 (2017). arxiv:1711.08141 http://arxiv.org/abs/1711.08141.Google ScholarGoogle Scholar
  40. [40] Wu Di, Zhang Yu, Jia Xijie, Tian Lu, Li Tianping, Sui Lingzhi, Xie Dongliang, and Shan Yi. 2019. A high-performance CNN processor based on FPGA for mobilenets. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 136143.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Wu Jiaxiang, Leng Cong, Wang Yuhang, Hu Qinghao, and Cheng Jian. 2015. Quantized convolutional neural networks for mobile devices. CoRR abs/1512.06473 (2015). arxiv:1512.06473 http://arxiv.org/abs/1512.06473.Google ScholarGoogle Scholar
  42. [42] Xilinx. 2018. Zynq-7000 SoC Data Sheet: Overview. (2018). Retrieved Jan 2, 2020 from https://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf.Google ScholarGoogle Scholar
  43. [43] Yang Yifan, Huang Qijing, Wu Bichen, Zhang Tianjun, Ma Liang, Gambardella Giulio, Blott Michaela, Lavagno Luciano, Vissers Kees, Wawrzynek John, and Keutzer Kurt. 2019. Synetgy: Algorithm-hardware co-design for ConvNet accelerators on embedded FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). Association for Computing Machinery, New York, NY, USA, 2332. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, and Sun Jian. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 68486856.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Zhao Ritchie, Song Weinan, Zhang Wentao, Xing Tianwei, Lin Jeng-Hau, Srivastava Mani, Gupta Rajesh, and Zhang Zhiru. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. Association for Computing Machinery, New York, NY, USA.Google ScholarGoogle Scholar
  46. [46] Zhuang Bohan, Shen Chunhua, Tan Mingkui, Liu Lingqiao, and Reid Ian. 2018. Towards effective low-bitwidth convolutional neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 79207928.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Efficient CNN Accelerator for Low-Cost Edge Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
      July 2022
      330 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3551651
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 August 2022
      • Online AM: 26 May 2022
      • Revised: 1 May 2022
      • Accepted: 1 May 2022
      • Received: 1 August 2021
      Published in tecs Volume 21, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!