skip to main content
research-article

SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks

Published:24 January 2023Publication History
Skip Abstract Section

Abstract

The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. However, given that the dimensionality of RNNs varies a lot for different tasks, it is crucial to generalize this efficiency to diverse configurations.

In this work, we identify adaptiveness as a key feature that is missing from today’s RNN accelerators. In particular, we first show the problem of low resource utilization and low adaptiveness for the state-of-the-art RNN implementations on GPU, FPGA, and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies. To do so, we propose Sharp as a hardware accelerator, which pipelines RNN computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, Sharp employs dynamic reconfigurable architecture to adapt to the model’s characteristics.

Sharp achieves 2×, 2.8×, and 82× speedups on average, considering different RNN models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and GPU implementations, respectively. Furthermore, we provide significant energy reduction with respect to the previous solutions, due to the low power dissipation of Sharp (321 GFLOPS/Watt).

REFERENCES

  1. [1] Appleyard Jeremy, Kociský Tomás, and Blunsom Phil. 2016. Optimizing performance of recurrent neural networks on GPUs. CoRR abs/1604.01946 (2016). arXiv:1604.01946. http://arxiv.org/abs/1604.01946.Google ScholarGoogle Scholar
  2. [2] Chandar Sarath, Sankar Chinnadhurai, Vorontsov Eugene, Kahou Samira Ebrahimi, and Bengio Yoshua. 2019. Towards non-saturating recurrent units for modelling long-term dependencies. CoRR abs/1902.06704 (2019). arXiv:1902.06704. http://arxiv.org/abs/1902.06704.Google ScholarGoogle Scholar
  3. [3] Chang Andre Xian Ming, Martini Berin, and Culurciello Eugenio. 2015. Recurrent neural networks hardware implementation on FPGA. CoRR abs/1511.05552 (2015). arXiv:1511.05552. http://arxiv.org/abs/1511.05552.Google ScholarGoogle Scholar
  4. [4] Chetlur Sharan, Woolley Cliff, Vandermersch Philippe, Cohen Jonathan, Tran John, Catanzaro Bryan, and Shelhamer Evan. 2014. cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). arXiv:1410.0759. http://arxiv.org/abs/1410.0759.Google ScholarGoogle Scholar
  5. [5] Cho Kyunghyun, Merrienboer Bart van, Gülçehre Çaglar, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078 (2014). arXiv:1406.1078. http://arxiv.org/abs/1406.1078.Google ScholarGoogle Scholar
  6. [6] Chung Junyoung, Gülçehre Çaglar, Cho KyungHyun, and Bengio Yoshua. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014). arXiv:1412.3555. http://arxiv.org/abs/1412.3555.Google ScholarGoogle Scholar
  7. [7] Diamos Greg, Sengupta Shubho, Catanzaro Bryan, Chrzanowski Mike, Coates Adam, Elsen Erich, Engel Jesse, Hannun Awni Y., and Satheesh Sanjeev. 2016. Persistent RNNs: Stashing recurrent weights on-chip. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). 20242033.Google ScholarGoogle Scholar
  8. [8] Donahue Jeff, Hendricks Lisa Anne, Guadarrama Sergio, Rohrbach Marcus, Venugopalan Subhashini, Saenko Kate, and Darrell Trevor. 2014. Long-term recurrent convolutional networks for visual recognition and description. CoRR abs/1411.4389 (2014). arXiv:1411.4389. http://arxiv.org/abs/1411.4389.Google ScholarGoogle Scholar
  9. [9] Fowers J., Ovtcharov K., Papamichael M., Massengill T., Liu M., Lo D., Alkalay S., Haselman M., Adams L., Ghandi M., Heil S., Patel P., Sapek A., Weisz G., Woods L., Lanka S., Reinhardt S. K., Caulfield A. M., Chung E. S., and Burger D.. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of ISCA’45. 114. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Guan Y., Yuan Z., Sun G., and Cong J.. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In Proceedings of the 22nd ASP-DAC. 629634. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Gupta Udit, Reagen Brandon, Pentecost Lillian, Donato Marco, Tambe Thierry, Rush Alexander M., Wei Gu-Yeon, and Brooks David. 2019. MASR: A modular accelerator for sparse RNNs. In Proceedings of the 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). 114. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Han Song, Kang Junlong, Mao Huizi, Hu Yiming, Li Xin, Li Yubin, Xie Dongliang, Luo Hong, Yao Song, Wang Yu, Yang Huazhong, and Dally William J.. 2016. ESE: Efficient speech recognition engine with compressed LSTM on FPGA. CoRR abs/1612.00694 (2016). arXiv:1612.00694. http://arxiv.org/abs/1612.00694.Google ScholarGoogle Scholar
  13. [13] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:cs.CV/1510.00149.Google ScholarGoogle Scholar
  14. [14] Hartigan J. A. and Wong M. A.. 1979. Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100108. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780. DOI:arXiv:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Hoffmann Javier, Guzmán Osvaldo Navarro, Kästner Florian, Janßen Benedikt, and Hübner Michael. 2017. A survey on CNN and RNN implementations. In Proceedings of the 7th International Conference on Performance, Safety and Robustness in Complex Systems and Applications. CYBER, 3339. Google ScholarGoogle Scholar
  17. [17] Holmes Connor, Mawhirter Daniel, He Yuxiong, Yan Feng, and Wu Bo. 2019. GRNN: Low-latency and scalable RNN inference on GPUs. In Proceedings of EuroSys’19. 41:1–41:16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 112. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Kim Jaeyoung, El-Khamy Mostafa, and Lee Jungwon. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. (2017). arXiv:cs.LG/1701.03360.Google ScholarGoogle Scholar
  20. [20] Lakretz Yair, Kruszewski Germán, Desbordes Theo, Hupkes Dieuwke, Dehaene Stanislas, and Baroni Marco. 2019. The emergence of number and syntax units in LSTM language models. CoRR abs/1903.07435 (2019). arXiv:1903.07435. http://arxiv.org/abs/1903.07435.Google ScholarGoogle Scholar
  21. [21] Lam M. W. Y., Chen X., Hu S., Yu J., Liu X., and Meng H.. 2019. Gaussian process LSTM recurrent neural network language models for speech recognition. In Proceedings of ICASSP’19. 72357239. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Li S., Chen K., Ahn J. H., Brockman J. B., and Jouppi N. P.. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In IEEE/ACM ICCAD.Google ScholarGoogle Scholar
  23. [23] Li S., Wu C., Li H., Li B., Wang Y., and Qiu Q.. 2015. FPGA acceleration of recurrent neural network based language model. In Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 111118. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Miao Yajie, Gowayyed Mohammad, and Metze Florian. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. CoRR abs/1507.08240 (2015). arXiv:1507.08240. http://arxiv.org/abs/1507.08240.Google ScholarGoogle Scholar
  25. [25] Mikolov Tomas, Karafiát Martin, Burget Lukas, Cernocký Jan, and Khudanpur Sanjeev. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association. 10451048.Google ScholarGoogle Scholar
  26. [26] Narang S. and Diamo G.. 2017. Baidu DeepBench. (2017). https://github.com/baidu-research/DeepBench.Google ScholarGoogle Scholar
  27. [27] Ng Joe Yue-Hei, Hausknecht Matthew, Vijayanarasimhan Sudheendra, Vinyals Oriol, Monga Rajat, and Toderici George. 2015. Beyond Short Snippets: Deep Networks for Video Classification. (2015). arXiv:cs.CV/1503.08909.Google ScholarGoogle Scholar
  28. [28] Park Daniel S., Chan William, Zhang Yu, Chiu Chung-Cheng, Zoph Barret, Cubuk Ekin D., and Le Quoc V.. 2019. SpecAugment: A simple data augmentation method for automatic speech recognition. Interspeech 2019 (Sep.2019). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Salehinejad Hojjat, Baarbe Julianne, Sankar Sharan, Barfett Joseph, Colak Errol, and Valaee Shahrokh. 2018. Recent advances in recurrent neural networks. CoRR abs/1801.01078 (2018). arXiv:1801.01078. http://arxiv.org/abs/1801.01078.Google ScholarGoogle Scholar
  30. [30] Seo Min Joon, Kembhavi Aniruddha, Farhadi Ali, and Hajishirzi Hannaneh. 2016. Bidirectional attention flow for machine comprehension. CoRR abs/1611.01603 (2016). arXiv:1611.01603. http://arxiv.org/abs/1611.01603.Google ScholarGoogle Scholar
  31. [31] Silfa Franyell, Dot Gem, Arnau Jose-Maria, and Gonzàlez Antonio. 2018. E-PUR: An energy-efficient processing unit for recurrent neural networks. In PACT’18. ACM, New York, NY, Article 18, 12 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). arXiv:1409.3215. http://arxiv.org/abs/1409.3215.Google ScholarGoogle Scholar
  33. [33] Synopsys. 2010. https://www.synopsys.com/.Google ScholarGoogle Scholar
  34. [34] Synopsys. 2021. Synopsys DesignWare Library. https://www.synopsys.com/dw/buildingblock.php.Google ScholarGoogle Scholar
  35. [35] Tabani Hamid, Arnau Jose-Maria, Tubella Jordi, and González Antonio. 2017. An ultra low-power hardware accelerator for acoustic scoring in speech recognition. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT’17). 4152. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] TN-41-01. 2007. Calculating Memory System Power for DDR3, Micron Technology. Technical Report.Google ScholarGoogle Scholar
  37. [37] Venugopalan Subhashini, Rohrbach Marcus, Donahue Jeff, Mooney Raymond J., Darrell Trevor, and Saenko Kate. 2015. Sequence to sequence—video to text. CoRR abs/1505.00487 (2015). arXiv:1505.00487. http://arxiv.org/abs/1505.00487.Google ScholarGoogle Scholar
  38. [38] Vinyals Oriol, Toshev Alexander, Bengio Samy, and Erhan Dumitru. 2014. Show and tell: A neural image caption generator. CoRR abs/1411.4555 (2014). arXiv:1411.4555. http://arxiv.org/abs/1411.4555.Google ScholarGoogle Scholar
  39. [39] Wang Shuohang and Jiang Jing. 2018. An LSTM model for cloze-style machine comprehension. In Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing.Google ScholarGoogle Scholar
  40. [40] Wang Shuo, Li Zhe, Ding Caiwen, Yuan Bo, Wang Yanzhi, Qiu Qinru, and Liang Yun. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. CoRR abs/1803.06305 (2018). arXiv:1803.06305. http://arxiv.org/abs/1803.06305.Google ScholarGoogle Scholar
  41. [41] Weninger Felix, Bergmann Johannes, and Schuller Björn. 2015. Introducing CURRENNT: The Munich open-source CUDA recurrent neural network toolkit. Journal of Machine Learning Research 16, 1 (2015), 547551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). arXiv:1609.08144. http://arxiv.org/abs/1609.08144.Google ScholarGoogle Scholar
  43. [43] Yokim Scott. 2018. Tensor Ops Made Easier in cuDNN. https://devblogs.nvidia.com/tensor-ops-made-easier-in-cudnn/.Google ScholarGoogle Scholar
  44. [44] Zaremba Wojciech, Sutskever Ilya, and Vinyals Oriol. 2014. Recurrent neural network regularization. CoRR abs/1409.2329 (2014). arXiv:1409.2329.Google ScholarGoogle Scholar
  45. [45] Zhang Minjia, Rajbhandari Samyam, Wang Wenhan, and He Yuxiong. 2018. DeepCPU: Serving RNN-based deep learning models 10× faster. In 2018 USENIX Annual Technical Conference. 951965. Google ScholarGoogle Scholar
  46. [46] Zhu Haichao, Dong Li, Wei Furu, Wang Wenhui, Qin Bing, and Liu Ting. 2019. Learning to ask unanswerable questions for machine reading comprehension. CoRR abs/1906.06045 (2019).Google ScholarGoogle Scholar

Index Terms

  1. SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 22, Issue 2
        March 2023
        560 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3572826
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 January 2023
        • Online AM: 12 August 2022
        • Accepted: 18 July 2022
        • Revised: 4 April 2022
        • Received: 11 October 2021
        Published in tecs Volume 22, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!