skip to main content
research-article

CaffePresso: Accelerating Convolutional Networks on Embedded SoCs

Published:14 November 2017Publication History
Skip Abstract Section

Abstract

Auto-tuning and parametric implementation of deep learning kernels allow off-the-shelf accelerator-based embedded platforms to deliver high-performance and energy-efficient mappings of the inference phase of lightweight neural networks. Low-complexity classifiers are characterized by operations on small image maps with two to three deep layers and few class labels. For these use cases, we consider a range of embedded systems with 20W power budgets such as the Xilinx ZC706 (FPGA), NVIDIA Jetson TX1 (GPU), TI Keystone II (DSP), and Adapteva Parallella (RISC+NoC). In CaffePresso, we combine auto-tuning of the implementation parameters, and platform-specific constraints deliver optimized solutions for each input ConvNet specification.

References

  1. Zhaowei Cai, Mohammad J. Saberian, and Nuno Vasconcelos. 2015. Learning complexity-aware cascades for deep pedestrian detection. CoRR abs/1507.05348 (2015). Retrieved from http://arxiv.org/abs/1507.05348.Google ScholarGoogle Scholar
  2. Lukas Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, and Luca Benini. 2015. Origami: A convolutional network accelerator. In Proceedings of the 25th Edition on Great Lakes Symposium on VLSI (GLSVLSI’15). ACM, New York, 199--204. DOI:https://doi.org/10.1145/2742060.2743766 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, Guy Lorette (Ed.). Université de Rennes 1, Suvisoft, La Baule, France.Google ScholarGoogle Scholar
  4. Yu-Hsin Chen, Tushar Krishna, Joel Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerato for deep convolutional neural networks. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’16), Digest of Technical Papers.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google ScholarGoogle Scholar
  6. William Dally. 2015. High-Performance Hardware for Machine Learning. Retrieved from https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf.Google ScholarGoogle Scholar
  7. Steve K. Esser, Rathinakumar Appuswamy, Paul Merolla, John V. Arthur, and Dharmendra S. Modha. 2015. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, 1117--1125. http://papers.nips.cc/paper/5862-backpropagation-for-energy-efficient-neuromorphic-computing.pdf.Google ScholarGoogle Scholar
  8. V. Gokhale, Jonghoon Jin, A. Dundar, B. Martini, and E. Culurciello. 2014. A 240 G-ops/s mobile coprocessor for deep neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’14). 696--701. DOI:https://doi.org/10.1109/CVPRW.2014.106 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168 (2016).Google ScholarGoogle Scholar
  10. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. CoRR abs/1602.01528 (2016). Retrieved from http://arxiv.org/abs/1602.01528.Google ScholarGoogle Scholar
  11. G. Hegde, Siddhartha, N. Ramasamy, and N. Kapre. 2016. CaffePresso: An optimized library for deep learning on embedded accelerator-based platforms. In Proceedings of the 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES’16). 1--10. DOI:https://doi.org/10.1145/2968455.2968511 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).Google ScholarGoogle Scholar
  13. Andrew Lavin. 2015. Fast algorithms for convolutional neural networks. CoRR abs/1509.09308 (2015). Retrieved from http://arxiv.org/abs/1509.09308.Google ScholarGoogle Scholar
  14. Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S. Chung. 2015. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. Retrieved from http://research.microsoft.com/apps/pubs/default.aspx?id=240715.Google ScholarGoogle Scholar
  15. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016). Retrieved from http://arxiv.org/abs/1603.05279.Google ScholarGoogle Scholar
  16. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-y Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Aaron Severance and Guy G. F. Lemieux. 2013. Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor. In Proceedings of the 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS’13). IEEE, 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  18. Nicolas Vasilache, Jeff Johnson, Michaël Mathieu, Soumith Chintala, Serkan Piantino, and Yann LeCun. 2014. Fast convolutional nets with FBFFT: A GPU performance evaluation. CoRR abs/1412.7580 (2014). Retrieved from http://arxiv.org/abs/1412.7580.Google ScholarGoogle Scholar
  19. Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, and Gang Sun. 2015. Deep image: Scaling up image recognition. arXiv preprint arXiv:1501.02876 (2015).Google ScholarGoogle Scholar
  20. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). ACM, New York, 161--170. DOI:https://doi.org/10.1145/2684746.2689060 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CaffePresso: Accelerating Convolutional Networks on Embedded SoCs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 17, Issue 1
            Special Issue on Autonomous Battery-Free Sensing and Communication, Special Issue on ESWEEK 2016 and Regular Papers
            January 2018
            630 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/3136518
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 November 2017
            • Revised: 1 May 2017
            • Accepted: 1 May 2017
            • Received: 1 January 2017
            Published in tecs Volume 17, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!