skip to main content
research-article
Public Access

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

Published:15 December 2018Publication History
Skip Abstract Section

Abstract

Convolutional Neural Networks have rapidly become the most successful machine-learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations, and model parameters. The resulting scalability in performance, power efficiency, and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool that enables design-space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets, and a specific precision. We introduce formalizations of resource cost functions and performance predictions and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS F1, demonstrating new unprecedented measured throughput at 50 TOp/s on AWS F1 and 5 TOp/s on embedded devices.

References

  1. ImageNet Large Scale Visual Recognition Challenge (ILSVRC). 2017. Retrieved from http://image-net.org/challenges/talks_2017/ILSVRC2017_overview.pdf.Google ScholarGoogle Scholar
  2. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467.Google ScholarGoogle Scholar
  3. K. Abdelouahab, M. Pelcat, J. Sérot, C. Bourrasset, and F. Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. (2017).Google ScholarGoogle Scholar
  4. H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, and F. Pétrot. 2016. Ternary neural networks for resource-efficient AI applications. CoRR abs/1609.00222.Google ScholarGoogle Scholar
  5. R. Andri, L. Cavigelli, D. Rossi, and L. Benini. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the ISVLSI. IEEE, 236--241.Google ScholarGoogle Scholar
  6. U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. Chiu. 2017. An OpenCL (TM) deep-learning accelerator on Arria 10. CoRR abs/1701.03534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Baskin, N. Liss, A. Mendelson, and E. Zheltonozhskii. 2017. Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052.Google ScholarGoogle Scholar
  8. Doug Burger. 2017. Microsoft Unveils Project Brainwave for Real-Rime AI. Retrieved from https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/.Google ScholarGoogle Scholar
  9. Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave gaussian quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle Scholar
  10. K. Chellapilla, S. Puri, and P. Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft.Google ScholarGoogle Scholar
  11. Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam. 2016. DianNao family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Chen, J. Emer, and V. Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the ISCA. IEEE, 367--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.0 (2016).Google ScholarGoogle Scholar
  14. E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), Vol. 1. MIT Press, 1269--1277. http://dl.acm.org/citation.cfm?id=2968826.2968968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, McKinstry, Timothy Melano, Davis R. Barch, Carmelo Di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proc. Natl. Acad. Sci. 113, 41 (2016), 11441--11446. http://www.pnas.org/content/113/41/11441.Google ScholarGoogle ScholarCross RefCross Ref
  16. Benoit Jacob et al. 2017. gemmlowp: A Small Self-Contained Low-Precision GEMM Library. Retrieved from https://github.com/google/gemmlowp.Google ScholarGoogle Scholar
  17. C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. 2009. CNP: An FPGA-based processor for convolutional networks. In Proceedings of the IEEE FPL. IEEE, 32--37.Google ScholarGoogle Scholar
  18. J. Faraone, N. Fraser, G. Gambardella, P. H. W. Blott, and M. Leong. 2017. Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. In Proceedings of the ICONIP. Springer, 393--404.Google ScholarGoogle Scholar
  19. Julian Faraone, Giulio Gambardella, David Boland, Nicholas J. Fraser, Michaela Blott, and Philip H. W. Leong. 2018. Customizing low-precision deep neural networks For FPGAs.Google ScholarGoogle Scholar
  20. N. J. Fraser, Y. Umuroglu, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the PARMA-DITAM. 6. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015).Google ScholarGoogle Scholar
  22. S. Han, J. Pool, J. Tran, and W. J. Dally. 2015. Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Hegde, Siddhartha, N. Ramasamy, and N. Kapre. 2016. CaffePresso: An optimized library for deep learning on embedded accelerator-based platforms. In Proceedings of the CASES. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In Proceedings of the ISSCC. IEEE, 10--14.Google ScholarGoogle ScholarCross RefCross Ref
  25. F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and < 1 MB model size. CoRR abs/1602.07630 (2016).Google ScholarGoogle Scholar
  26. Li Jiao, Cheng Luo, Wei Cao, Xuegong Zhou, and Lingli Wang. 2017. Accelerating low bit-width convolutional neural networks with embedded FPGA. In Proceedings of the FPL. IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  27. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ISCA. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the MICRO. IEEE, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. CoRR abs/1601.0 (2016).Google ScholarGoogle Scholar
  30. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the NIPS. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xilinx Research Labs. 2017. BNN-PYNQ. Retrieved from https://github.com/Xilinx/BNN-PYNQ.Google ScholarGoogle Scholar
  32. Xilinx Research Labs. 2017. FINN-R. Retrieved from https://github.com/XilinxDublinLabs/FINN-R.Google ScholarGoogle Scholar
  33. Xilinx Research Labs. 2018. QNN-MO-PYNQ. Retrieved from https://github.com/Xilinx/QNN-MO-PYNQ.Google ScholarGoogle Scholar
  34. S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086. http://www.sciencedirect.com/science/article/pii/S0925231217315655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. ARM Limited. 2017. Compute Library. Retrieved from https://developer.arm.com/technologies/compute-library.Google ScholarGoogle Scholar
  36. B. Liu, M. Wang, H. Foroosh, M. F. Tappen, and M. Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814. Retrieved fromGoogle ScholarGoogle Scholar
  37. Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Proceedings of the FPL. IEEE, 1--8.Google ScholarGoogle Scholar
  38. Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the FPGA 2017. ACM, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. K. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. CoRR abs/1709.01134.Google ScholarGoogle Scholar
  40. J. Misra and I. Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1--3 (2010), 239--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Moss, E. Nurvitadhi, J. Sim, A. Mishra, D. Marr, S. Subhaschandra, and P. Leong. 2017. High-performance binary neural networks on the Xeon+ FPGA platform. In Proceedings of the FPL. IEEE.Google ScholarGoogle Scholar
  42. H. Nakahara, T. Fujii, and S. Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the FPL. IEEE, 1--4.Google ScholarGoogle Scholar
  43. H. Nakahara, H. Yonekawa, T. Fujii, M. Shimoda, and S. Sato. 2017. A demonstration of the GUINNESS: A GUI -based neural network synthesizer for an FPGA. In Proceedings of the FPL. IEEE, 1--1.Google ScholarGoogle Scholar
  44. E. Nurvitadhi, D. Sheffield, Jaewoong Sim, A. Mishra, G. Venkatesh, and D. Marr. 2016. Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the FPT. 77--84.Google ScholarGoogle Scholar
  45. E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, Y. Liew, K. Srivatsan, D. Moss, S. Subhaschandra, et al. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the FPGA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. Ovtcharov, O. Ruwase, J. Kim, J. Fowers, K. Strauss, and E. Chung. 2015. Accelerating deep convolutional neural networks using specialized hardware. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf.Google ScholarGoogle Scholar
  47. Jinhwan Park and Wonyong Sung. 2016. FPGA-based implementation of deep neural networks using on-chip memory only. In Proceedings of the ICASSP. IEEE, 1011--1015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Th. B. Preußer. 2017. Generic and universal parallel matrix summation with a flexible compression goal for xilinx FPGAs. In Proceedings of the FPL.Google ScholarGoogle ScholarCross RefCross Ref
  49. A. Prost-Boucle, A. Bourge, F. Pétrot, H. Alemdar, N. Caldwell, and V. Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the FPL. IEEE.Google ScholarGoogle Scholar
  50. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016).Google ScholarGoogle Scholar
  51. B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G. Wei, and D. Brooks. 2016. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the ISCA. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. Retrieved from http://pjreddie.com/darknet/.Google ScholarGoogle Scholar
  53. J. Redmon and A. Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). 6517--6525.Google ScholarGoogle Scholar
  54. H. Sharma, J. Park, E. Amaro, B. Thwaites, P. Kotha, A. Gupta, J. K. Kim, A. Mishra, and H. Esmaeilzadeh. 2016. D<scp>nn</scp>W<scp>eaver</scp>: From high-level deep network models to FPGA acceleration. In Proceedings of the Workshop on Cognitive Architectures. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.Google ScholarGoogle Scholar
  56. Jiang Su, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Gianluca Durelli, David B. Thomas, Philip H. W. Leong, and Peter Y. K. Cheung. 2018. Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic. In Proceedings of the ARC. ACM, to Appear.Google ScholarGoogle Scholar
  57. Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of deep neural networks under quantization. abs/1511.0.Google ScholarGoogle Scholar
  58. Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the FPGA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Y. Umuroglu and M. Jahre. 2017. Streamlined deployment for quantized neural networks. arXiv preprint arXiv:1709.04060.Google ScholarGoogle Scholar
  60. S. I. Venieris and C. Bouganis. 2016. fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. In Proceedings of the CCM. IEEE, 40--47.Google ScholarGoogle Scholar
  61. X. Wei, Peng Yu, C. H. and, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the DAC. ACM, 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Xilinx, Inc. 2017. Zynq-7000 All Programmable SoC Data Sheet: Overview. Xilinx, Inc.Google ScholarGoogle Scholar
  64. H. Yonekawa and H. Nakahara. 2017. On-chip memory-based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In Proceedings of the IPDPSW. IEEE, 98--105.Google ScholarGoogle Scholar
  65. J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. 2017. Scalpel: Customizing dnn pruning to the underlying hardware parallelism. In Proceedings of the ISCA. ACM, 548--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. S. Zagoruyko and N. Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146.Google ScholarGoogle Scholar
  67. Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the ICCAD. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the FPGA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. Zhang and J. Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the FPGA. 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. R. Zhao, W. Song, W. Zhang, T. Xing, J. Lin, M. Srivastava, R. Gupta, and Z. Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the FPGA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. CoRR abs/1702.03044. Retrieved from http://arxiv.org/abs/1702.03044.Google ScholarGoogle Scholar
  72. S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160.Google ScholarGoogle Scholar
  73. C. Zhu, S. Han, H. Mao, and W. J. Dally. 2016. Trained ternary quantization. CoRR abs/1612.01064.Google ScholarGoogle Scholar

Index Terms

  1. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!