skip to main content
research-article

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

Published:20 December 2018Publication History
Skip Abstract Section

Abstract

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive, which limits their applicability to real-time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms that have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 giga operations per second (GOPS) under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization and supports up to 17 frames per second for 512 × 512 image inputs with a power consumption of only 9.6W.

References

  1. Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 2481--2495.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2017. Semantic Understanding of Urban Street Scenes: Benchmark Suite. Retrieved from https://www.cityscapes-dataset.com/benchmarks/.Google ScholarGoogle Scholar
  3. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3213--3223.Google ScholarGoogle ScholarCross RefCross Ref
  4. Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. arXiv:1603.07285. https://arxiv.org/abs/1603.07285.Google ScholarGoogle Scholar
  5. Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. (2015). arXiv:1510.00149. https://arxiv.org/abs/1510.00149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kaiming He and Jian Sun. 2015. Convolutional neural networks at constrained time cost. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 5353--5360.Google ScholarGoogle ScholarCross RefCross Ref
  8. Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision (ICCV’17), Vol. 2. 6.Google ScholarGoogle ScholarCross RefCross Ref
  9. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. (2017). arXiv:1704.04861. https://arxiv.org/abs/1704.04861.Google ScholarGoogle Scholar
  10. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2016. Image-to-image translation with conditional adversarial networks. (2016). arXiv:1611.07004. http://arxiv.org/abs/1611.07004.Google ScholarGoogle Scholar
  11. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shuanglong Liu and Christos-Savvas Bouganis. 2017. Communication-Aware MCMC method for big data applications on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’17). 9--16.Google ScholarGoogle ScholarCross RefCross Ref
  13. Shuanglong Liu, Grigorios Mingas, and Christos-Savvas Bouganis. 2017. An unbiased mcmc fpga-based accelerator in the land of custom precision arithmetic. IEEE Trans. Comput. 66, 5 (2017), 745--758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  15. Liqiang Lu, Yun Liang, Qingcheng Xiao, and Shengen Yan. 2017. Evaluating fast algorithms for convolutional neural networks on fpgas. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’17). 101--108.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1520--1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the International Symposium on Computer Architecture (ISCA’17). 27--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. (2015). arXiv:1511.06434. https://arxiv.org/abs/1511.06434.Google ScholarGoogle Scholar
  20. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention. Springer, 234--241.Google ScholarGoogle Scholar
  21. Ruslan Salakhutdinov. 2015. Learning deep generative models. Ann. Rev. Stat. Its Appl. 2 (2015), 361--385.Google ScholarGoogle ScholarCross RefCross Ref
  22. Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1874--1883.Google ScholarGoogle ScholarCross RefCross Ref
  23. Amir Yazdanbakhsh, Michael Brzozowski, Behnam Khaleghi, Soroush Ghodrati, Kambiz Samadi, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. FlexiGAN: An end-to-end solution for FPGA acceleration of generative adversarial networks. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’18).Google ScholarGoogle ScholarCross RefCross Ref
  24. Amir Yazdanbakhsh, Kambiz Samadi, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. GANAX: A unified SIMD-MIMD acceleration for generative adversarial network. In Proceedings of the International Symposium on Computer Architecture (ISCA’18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, and Rob Fergus. 2010. Deconvolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 2528--2535.Google ScholarGoogle ScholarCross RefCross Ref
  26. Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus. 2011. Adaptive deconvolutional networks for mid and high-level feature learning. In Proceedings of the International Conference on Computer Vision (ICCV’11). 2018--2025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 26, 7 (2017), 3142--3155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xinyu Zhang, Srinjoy Das, Ojash Neopane, and Ken Kreutz-Delgado. 2017. A design methodology for efficient implementation of deconvolutional neural networks on an FPGA. (2017). arXiv:1705.02583. http://arxiv.org/abs/1705.02583.Google ScholarGoogle Scholar
  30. Ruizhe Zhao, Xinyu Niu, Yajie Wu, Wayne Luk, and Qiang Liu. 2017. Optimizing CNN-based object detection algorithms on embedded FPGA platforms. In Proceedings of the Annual ARC Proceessor Summit (ARC’17). Springer, 255--267.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ruizhe Zhao, Tim Todman, Wayne Luk, and Xinyu Niu. 2017. DeepPump: Multi-pumping deep neural networks. In Proceedings of the Annual IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’17). 206--206.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 3
        Special Issue on Deep learning on FPGAs
        September 2018
        187 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3299999
        • Editor:
        • Steve Wilton
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 December 2018
        • Accepted: 1 July 2018
        • Revised: 1 May 2018
        • Received: 1 December 2017
        Published in trets Volume 11, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!