skip to main content
research-article

PuDianNao: A Polyvalent Machine Learning Accelerator

Authors Info & Claims
Published:14 March 2015Publication History
Skip Abstract Section

Abstract

Machine Learning (ML) techniques are pervasive tools in various emerging commercial applications, but have to be accommodated by powerful computer systems to process very large data. Although general-purpose CPUs and GPUs have provided straightforward solutions, their energy-efficiencies are limited due to their excessive supports for flexibility. Hardware accelerators may achieve better energy-efficiencies, but each accelerator often accommodates only a single ML technique (family). According to the famous No-Free-Lunch theorem in the ML domain, however, an ML technique performs well on a dataset may perform poorly on another dataset, which implies that such accelerator may sometimes lead to poor learning accuracy. Even if regardless of the learning accuracy, such accelerator can still become inapplicable simply because the concrete ML task is altered, or the user chooses another ML technique.

In this study, we present an ML accelerator called PuDianNao, which accommodates seven representative ML techniques, including k-means, k-nearest neighbors, naive bayes, support vector machine, linear regression, classification tree, and deep neural network. Benefited from our thorough analysis on computational primitives and locality properties of different ML techniques, PuDianNao can perform up to 1056 GOP/s (e.g., additions and multiplications) in an area of 3.51 mm^2, and consumes 596 mW only. Compared with the NVIDIA K20M GPU (28nm process), PuDianNao (65nm process) is 1.20x faster, and can reduce the energy by 128.41x.

References

  1. UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/ml/. {Online; accessed 31-July-2014}.Google ScholarGoogle Scholar
  2. Naomi S Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175--185, 1992.Google ScholarGoogle Scholar
  3. Leo Breiman, Jerome Friedman, Richard Olshen, Charles Stone, D Steinberg, and P Colla. Cart: Classification and regression trees. Wadsworth: Belmont, CA, 156, 1983.Google ScholarGoogle Scholar
  4. Srihari Cadambi, Igor Durdanovic, Venkata Jakkula, Murugan Sankaradass, Eric Cosatto, Srimat Chakradhar, and Hans Peter Graf. A massively parallel fpga-based coprocessor for support vector machines. In Field Programmable Custom Computing Machines, 2009. FCCM'09. 17th IEEE Symposium on, pages 115--122. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ernie Chan. Algorithmic trading: winning strategies and their rationale, volume 625. John Wiley & Sons, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Min Chen, Shiwen Mao, Yin Zhang, and Victor CM Leung. Big Data-Related Technologies, Challenges and Future Prospects. Springer, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural support for programming languages and operating systems, pages 269--284. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th IEEE/ACM International Symposium on Microarchitecture (MICRO'14), pages 1--14. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dan C Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 1237, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303--314, 1989.Google ScholarGoogle Scholar
  13. George E Dahl, Dong Yu, Li Deng, and Alex Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1):30--42, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Allen L Edwards. An introduction to linear regression and correlation. 1976.Google ScholarGoogle Scholar
  15. Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pages 109--116. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  16. Edward W Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics, 21:768--769, 1965.Google ScholarGoogle Scholar
  17. AC Frery, CC de Araujo, Haglay Alice, Jorge Cerqueira, Juliana A Loureiro, Manoel Eusebio de Lima, Mdas Oliveira, MM Horta, et al. Hyperspectral images clustering on reconfigurable hardware using the k-means algorithm. In Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings. 16th Symposium on, pages 99--104. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vincent Garcia, Eric Debreuve, and Michel Barlaud. Fast k nearest neighbor search using gpu. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on, pages 1--6. IEEE, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  19. Jan NH Heemskerk. Overview of neural hardware. Neurocomputers for Brain-Style Processing. Design, Implementation and Application, 1995.Google ScholarGoogle Scholar
  20. Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82--97, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  21. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771--1800, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hanaa M Hussain, Khaled Benkrid, Huseyin Seker, and Ahmet T Erdogan. Fpga implementation of k-means algorithm for bioinformatics application: An accelerated approach to clustering microarray data. In Adaptive Hardware and Systems (AHS), 2011 NASA/ESA Conference on, pages 248--255. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  23. Pat Langley, Wayne Iba, and Kevin Thompson. An analysis of bayesian classifiers. In AAAI, volume 90, pages 223--228. Citeseer, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Quoc V Le. Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8595--8598. IEEE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarGoogle Scholar
  26. Ahmed Al Maashri, Michael Debole, Matthew Cotter, Nandhini Chandramoorthy, Yang Xiao, Vijaykrishnan Narayanan, and Chaitali Chakrabarti. Accelerating neuromorphic vision algorithms for recognition. In Proceedings of the 49th Annual Design Automation Conference, pages 579--584. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T Chakradhar, and Hans Peter Graf. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Transactions on Architecture and Code Optimization (TACO), 9(1):6, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Abhinandan Majumdar, Srihari Cadambi, and Srimat T Chakradhar. An energy-efficient heterogeneous system for embedded learning and classification. Embedded Systems Letters, IEEE, 3(1):42--45, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Elias S Manolakos and Ioannis Stamoulias. Ip-cores design for the knn classifier. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 4133--4136. IEEE, 2010.Google ScholarGoogle Scholar
  30. Tsutomu Maruyama. Real-time k-means clustering for color images on reconfigurable hardware. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, volume 2, pages 816--819. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Markos Papadonikolakis and C Bouganis. A heterogeneous fpga architecture for support vector machine training. In Field-Programmable Custom Computing Machines (FCCM), 2010 18th IEEE Annual International Symposium on, pages 211--214. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John C Platt, Nello Cristianini, and John Shawe-Taylor. Large margin dags for multiclass classification. In nips, volume 12, pages 547--553, 1999.Google ScholarGoogle Scholar
  33. J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81--106, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J Ross Quinlan. Bagging, boosting, and c4. 5. In AAAI/IAAI, Vol. 1, pages 725--730, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. DE Rummelhart. Learning representations by back-propagating errors. Nature, 323(9):533--536, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ioannis Stamoulias and Elias S Manolakos. Parallel architectures for the knn classifier--design of soft ip cores and fpga implementations. ACM Transactions on Embedded Computing Systems (TECS), 13(2):22, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Olivier Temam. The rebirth of neural networks. In International Symposium on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. George Teodoro, Rafael Sachetto, Olcay Sertel, Metin N Gurcan, W Meira, Umit Catalyurek, and Renato Ferreira. Coordinating the use of gpu and cpu for improving performance of compute intensive applications. In Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on, pages 1--10. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  39. David H Wolpert. The lack of a priori distinctions between learning algorithms. Neural computation, 8(7):1341--1390, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yao-Jung Yeh, Hui-Ya Li, Wen-Jyi Hwang, and Chiung-Yao Fang. Fpga implementation of knn classifier based on wavelet transform and partial distance search. In Image Analysis, pages 512--521. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tong Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning, page 116. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PuDianNao: A Polyvalent Machine Learning Accelerator

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 4
      ASPLOS '15
      April 2015
      676 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2775054
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2015
        720 pages
        ISBN:9781450328357
        DOI:10.1145/2694344

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 March 2015

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!