skip to main content
research-article

xDNN: Inference for Deep Convolutional Neural Networks

Published:11 January 2022Publication History
Skip Abstract Section

Abstract

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources.

On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project.

We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

REFERENCES

  1. [1] [n.d.]. ML Commons, Inference Data Center. Retrieved May 10, 2021 from https://mlcommons.org/en/inference-datacenter-10/.Google ScholarGoogle Scholar
  2. [2] Abdelfattah Mohamed S., Han David, Bitar Andrew, DiCecco Roberto, O’Connell Shane, Shanker Nitika, Chu Joseph, Prins Ian, Fender Joshua, Ling Andrew C., and Chiu Gordon R.. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE Computer Society, 411418. https://doi.org/10.1109/FPL.2018.00077Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Boyd Stephen and Vandenberghe Lieven. 2004. Convex Optimization. Cambridge University Press, New York, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Chen Yu-Hsin, Emer Joel S., and Sze Vivienne. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 367379. https://doi.org/10.1109/ISCA.2016.40Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Duarte Javieral. et2019. FPGAs as a service to accelerate machine learning inference. https://people.ece.uw.edu/hauck/publications/AcceleratedMachineLearning.pdf.Google ScholarGoogle Scholar
  6. [6] Duarte J., Han S., Harris P., Jindariani S., Kreinar E., Kreis B., Ngadiuba J., Pierini M., Rivera R., Tran N., and al. et2018. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 13, 07 (Jul. 2018), P07027–P07027. https://doi.org/10.1088/1748-0221/13/07/p07027Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Dumoulin Vincent and Visin Francesco. 2016. A guide to convolution arithmetic for deep learning. arxiv:1603.07285. Retrieved from http://arxiv.org/abs/1603.07285.Google ScholarGoogle Scholar
  8. [8] Fowers Jeremy, Ovtcharov Kalin, Papamichael Michael, Massengill Todd, Liu Ming, Lo Daniel, Alkalay Shlomi, Haselman Michael, Adams Logan, Ghandi Mahdi, Heil Stephen, Patel Prerak, Sapek Adam, Weisz Gabriel, Woods Lisa, Lanka Sitaram, Reinhardt Steven K., Caulfield Adrian M., Chung Eric S., and Burger Doug. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’18). 114. https://doi.org/10.1109/ISCA.2018.00012Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Fu Yao, Wu Ephrem, and Sirasao Ashish. 2015. 8-Bit Dot-Product Acceleration, Xilinx White Paper-WP487.Google ScholarGoogle Scholar
  10. [10] Gokhale Vinayak, Zaidy Aliasger, Chang Andre Xian Ming, and Culurciello Eugenio. 2017. Snowflake: An efficient hardware accelerator for convolutional neural networks. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’17). 14. https://doi.org/10.1109/ISCAS.2017.8050809Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Guo K., Sui L., Qiu J., Yu J., Wang J., Yao S., Han S., Wang Y., and Yang H.. 2018. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 37, 1 (Jan. 2018), 3547. https://doi.org/10.1109/TCAD.2017.2705069Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gysel Philipp, Motamedi Mohammad, and Ghiasi Soheil. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv:1604.03168. Retrieved from http://arxiv.org/abs/1604.03168.Google ScholarGoogle Scholar
  13. [13] Han Song, Kang Junlong, Mao Huizi, Hu Yiming, Li Xin, Li Yubin, Xie Dongliang, Luo Hong, Yao Song, Wang Yu, Yang Huazhong, and Dally William J.. 2016. ESE: Efficient speech recognition engine with compressed LSTM on FPGA. arXiv:1612.00694. Retrieved from http://arxiv.org/abs/1612.00694.Google ScholarGoogle Scholar
  14. [14] Han Song, Liu Xingyu, Mao Huizi, Pu Jing, Pedram Ardavan, Horowitz Mark A., and Dally William J.. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 243254. https://doi.org/10.1109/ISCA.2016.30Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Han Song, Mao Huizi, and Dally William J.. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149. Retrieved from http://arxiv.org/abs/1510.00149.Google ScholarGoogle Scholar
  16. [16] Han Song, Mao Huizi, and Dally William J.. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015).Google ScholarGoogle Scholar
  17. [17] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2015. Deep residual learning for image recognition. arXiv:1512.03385. Retrieved from http://arxiv.org/abs/1512.03385.Google ScholarGoogle Scholar
  18. [18] Hinton Geoffrey, Deng Li, Yu Dong, Dahl George, Mohamed Abdel-rahman, Jaitly Navdeep, Senior Andrew, Vanhoucke Vincent, Nguyen Patrick, Kingsbury Brian, and Sainath Tara. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Sign. Process. Mag. 29, 8297.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061. Retrieved from http://arxiv.org/abs/1609.07061.Google ScholarGoogle Scholar
  20. [20] Jain Sambhav R., Gural Albert, Wu Michael, and Dick Chris. 2019. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. arXiv:1903.08066. Retrieved from http://arxiv.org/abs/1903.08066.Google ScholarGoogle Scholar
  21. [21] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David A., Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-Datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). 112. https://doi.org/10.1145/3079856.3080246Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems–Volume 1 (NIPS’12). Curran Associates Inc., USA, 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Kung H. T.. 1982. Why systolic architectures?IEEE Comput. 15, 1 (1982), 3746. https://doi.org/10.1109/MC.1982.1653825Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng Yang, and Berg Alexander C.. 2016. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16) (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)), Leibe Bastian, Matas Jiri, Welling Max, and Sebe Nicu (Eds.). Springer Verlag, Germany, 2137. https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Quinton P. and Robert Y.. 1991. Systolic Algorithms & Architectures. Prentice Hall. gb90020190Google ScholarGoogle Scholar
  26. [26] Reddi Vijay Janapa, Cheng Christine, Kanter David, Mattson Peter, Schmuelling Guenther, Wu Carole-Jean, Anderson Brian, Breughe Maximilien, Charlebois Mark, Chou William, Chukka Ramesh, Coleman Cody, Davis Sam, Deng Pan, Diamos Greg, Duke Jared, Fick Dave, Gardner J. Scott, Hubara Itay, Idgunji Sachin, Jablin Thomas B., Jiao Jeff, John Tom St., Kanwar Pankaj, Lee David, Liao Jeffery, Lokhmotov Anton, Massa Francisco, Meng Peng, Micikevicius Paulius, Osborne Colin, Pekhimenko Gennady, Rajan Arun Tejusve Raghunath, Sequeira Dilip, Sirasao Ashish, Sun Fei, Tang Hanlin, Thomson Michael, Wei Frank, Wu Ephrem, Xu Lingjie, Yamada Koichi, Yu Bing, Yuan George, Zhong Aaron, Zhang Peizhao, and Zhou Yuchen. 2020. MLPerf inference benchmark. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’20). IEEE, 446459. https://doi.org/10.1109/ISCA45697.2020.00045Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Settle Sean O., Bollavaram Manasa, D’Alberto Paolo, Delaye Elliott, Fernandez Oscar, Fraser Nicholas, Ng Aaron, Sirasao Ashish, and Wu Michael. 2018. Quantizing convolutional neural networks for low-power high-throughput inference engines. arXiv:1805.07941. Retrieved from http://arxiv.org/abs/1805.07941.Google ScholarGoogle Scholar
  28. [28] Settle Sean O., Bollavaram Manasa, D’Alberto Paolo, Delaye Elliott, Fernandez Oscar, Fraser Nicholas, Ng Aaron, Sirasao Ashish, and Wu Michael. 2018. Quantizing convolutional neural networks for low-power high-throughput inference engines. arXiv:1805.07941. Retrieved from http://arxiv.org/abs/1805.07941.Google ScholarGoogle Scholar
  29. [29] Sharma Hardik, Park Jongse, Mahajan Divya, Amaro Emmanuel, Kim Joon Kyung, Shao Chenkai, Mishra Asit, and Esmaeilzadeh Hadi. 2016. From high-level deep neural models to FPGAs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 17:1–17:12. https://doi.org/10.1109/MICRO.2016.7783720Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Sharma Rohit. 2019. trafficVision: Inferencing Traffic with yolo on AMU GPU. Retrieved from https://github.com/srohit0/trafficVision.Google ScholarGoogle Scholar
  31. [31] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott E., Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 19. https://doi.org/10.1109/CVPR.2015.7298594Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Umuroglu Yaman, Fraser Nicholas J., Gambardella Giulio, Blott Michaela, Leong Philip Heng Wai, Jahre Magnus, and Vissers Kees A.. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 6574. http://dl.acm.org/citation.cfm?id=3021744.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Wu Ephrem, Zhang Xiaoqian, Berman David, and Cho Inkeun. 2017. A high-throughput reconfigurable processing array for neural networks. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). 14. https://doi.org/10.23919/FPL.2017.8056794Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Xilinx. 2018. Adaptable Accelerator Cards for Data Center Workloads. Retrieved from https://www.xilinx.com/products/boards-and-kits/alveo.html.Google ScholarGoogle Scholar
  35. [35] Xilinx. 2018. UG579, UltraScale Architecture DSP48 Slice User Guide.Google ScholarGoogle Scholar
  36. [36] Xing Yu, Liang Shuang, Sui Lingzhi, Jia Xijie, Qiu Jiantao, Liu Xin, Wang Yushun, Shan Yi, and Wang Yu. 2020. DNNVM: End-to-End compiler leveraging heterogeneous optimizations on FPGA-Based CNN accelerators. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 39, 10 (2020), 26682681. https://doi.org/10.1109/TCAD.2019.2930577Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Yu Yunxuan, Wu Chen, Zhao Tiandong, Wang Kun, and He Lei. 2020. OPU: An FPGA-Based overlay processor for convolutional neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 3547. https://doi.org/10.1109/TVLSI.2019.2939726Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. xDNN: Inference for Deep Convolutional Neural Networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 2
            June 2022
            310 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/3501287
            • Editor:
            • Deming Chen
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 January 2022
            • Accepted: 1 June 2021
            • Revised: 1 May 2021
            • Received: 1 January 2021
            Published in trets Volume 15, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!