Abstract
The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. However, given that the dimensionality of RNNs varies a lot for different tasks, it is crucial to generalize this efficiency to diverse configurations.
In this work, we identify adaptiveness as a key feature that is missing from today’s RNN accelerators. In particular, we first show the problem of low resource utilization and low adaptiveness for the state-of-the-art RNN implementations on GPU, FPGA, and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies. To do so, we propose Sharp as a hardware accelerator, which pipelines RNN computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, Sharp employs dynamic reconfigurable architecture to adapt to the model’s characteristics.
Sharp achieves 2×, 2.8×, and 82× speedups on average, considering different RNN models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and GPU implementations, respectively. Furthermore, we provide significant energy reduction with respect to the previous solutions, due to the low power dissipation of Sharp (321 GFLOPS/Watt).
- [1] . 2016. Optimizing performance of recurrent neural networks on GPUs. CoRR abs/1604.01946 (2016). arXiv:1604.01946. http://arxiv.org/abs/1604.01946.Google Scholar
- [2] . 2019. Towards non-saturating recurrent units for modelling long-term dependencies. CoRR abs/1902.06704 (2019). arXiv:1902.06704. http://arxiv.org/abs/1902.06704.Google Scholar
- [3] . 2015. Recurrent neural networks hardware implementation on FPGA. CoRR abs/1511.05552 (2015). arXiv:1511.05552. http://arxiv.org/abs/1511.05552.Google Scholar
- [4] . 2014. cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). arXiv:1410.0759. http://arxiv.org/abs/1410.0759.Google Scholar
- [5] . 2014. Learning phrase representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078 (2014). arXiv:1406.1078. http://arxiv.org/abs/1406.1078.Google Scholar
- [6] . 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014). arXiv:1412.3555. http://arxiv.org/abs/1412.3555.Google Scholar
- [7] . 2016. Persistent RNNs: Stashing recurrent weights on-chip. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). 2024–2033.Google Scholar
- [8] . 2014. Long-term recurrent convolutional networks for visual recognition and description. CoRR abs/1411.4389 (2014). arXiv:1411.4389. http://arxiv.org/abs/1411.4389.Google Scholar
- [9] . 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of ISCA’45. 1–14.
DOI: Google ScholarDigital Library
- [10] . 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In Proceedings of the 22nd ASP-DAC. 629–634.
DOI: Google ScholarDigital Library
- [11] . 2019. MASR: A modular accelerator for sparse RNNs. In Proceedings of the 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). 1–14.
DOI: Google ScholarCross Ref
- [12] . 2016. ESE: Efficient speech recognition engine with compressed LSTM on FPGA. CoRR abs/1612.00694 (2016). arXiv:1612.00694. http://arxiv.org/abs/1612.00694.Google Scholar
- [13] . 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:cs.CV/1510.00149.Google Scholar
- [14] . 1979. Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100–108. Google Scholar
Cross Ref
- [15] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
DOI: arXiv:https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarDigital Library
- [16] . 2017. A survey on CNN and RNN implementations. In Proceedings of the 7th International Conference on Performance, Safety and Robustness in Complex Systems and Applications. CYBER, 33–39. Google Scholar
- [17] . 2019. GRNN: Low-latency and scalable RNN inference on GPUs. In Proceedings of EuroSys’19. 41:1–41:16. Google Scholar
Digital Library
- [18] . 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 1–12.
DOI: Google ScholarDigital Library
- [19] . 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. (2017). arXiv:cs.LG/1701.03360.Google Scholar
- [20] . 2019. The emergence of number and syntax units in LSTM language models. CoRR abs/1903.07435 (2019). arXiv:1903.07435. http://arxiv.org/abs/1903.07435.Google Scholar
- [21] . 2019. Gaussian process LSTM recurrent neural network language models for speech recognition. In Proceedings of ICASSP’19. 7235–7239.
DOI: Google ScholarCross Ref
- [22] . 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In IEEE/ACM ICCAD.Google Scholar
- [23] . 2015. FPGA acceleration of recurrent neural network based language model. In Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 111–118.
DOI: Google ScholarDigital Library
- [24] . 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. CoRR abs/1507.08240 (2015). arXiv:1507.08240. http://arxiv.org/abs/1507.08240.Google Scholar
- [25] . 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association. 1045–1048.Google Scholar
- [26] . 2017. Baidu DeepBench. (2017). https://github.com/baidu-research/DeepBench.Google Scholar
- [27] . 2015. Beyond Short Snippets: Deep Networks for Video Classification. (2015). arXiv:cs.CV/1503.08909.Google Scholar
- [28] . 2019. SpecAugment: A simple data augmentation method for automatic speech recognition. Interspeech 2019 (
Sep. 2019).DOI: Google ScholarCross Ref
- [29] . 2018. Recent advances in recurrent neural networks. CoRR abs/1801.01078 (2018). arXiv:1801.01078. http://arxiv.org/abs/1801.01078.Google Scholar
- [30] . 2016. Bidirectional attention flow for machine comprehension. CoRR abs/1611.01603 (2016). arXiv:1611.01603. http://arxiv.org/abs/1611.01603.Google Scholar
- [31] . 2018. E-PUR: An energy-efficient processing unit for recurrent neural networks. In
PACT’18 . ACM, New York, NY, Article18 , 12 pages.DOI: Google ScholarDigital Library
- [32] . 2014. Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). arXiv:1409.3215. http://arxiv.org/abs/1409.3215.Google Scholar
- [33] . 2010. https://www.synopsys.com/.Google Scholar
- [34] . 2021. Synopsys DesignWare Library. https://www.synopsys.com/dw/buildingblock.php.Google Scholar
- [35] . 2017. An ultra low-power hardware accelerator for acoustic scoring in speech recognition. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT’17). 41–52.
DOI: Google ScholarCross Ref
- [36] . 2007. Calculating Memory System Power for DDR3, Micron Technology.
Technical Report .Google Scholar - [37] . 2015. Sequence to sequence—video to text. CoRR abs/1505.00487 (2015). arXiv:1505.00487. http://arxiv.org/abs/1505.00487.Google Scholar
- [38] . 2014. Show and tell: A neural image caption generator. CoRR abs/1411.4555 (2014). arXiv:1411.4555. http://arxiv.org/abs/1411.4555.Google Scholar
- [39] . 2018. An LSTM model for cloze-style machine comprehension. In Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing.Google Scholar
- [40] . 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. CoRR abs/1803.06305 (2018). arXiv:1803.06305. http://arxiv.org/abs/1803.06305.Google Scholar
- [41] . 2015. Introducing CURRENNT: The Munich open-source CUDA recurrent neural network toolkit. Journal of Machine Learning Research 16, 1 (2015), 547–551. Google Scholar
Digital Library
- [42] . 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). arXiv:1609.08144. http://arxiv.org/abs/1609.08144.Google Scholar
- [43] . 2018. Tensor Ops Made Easier in cuDNN. https://devblogs.nvidia.com/tensor-ops-made-easier-in-cudnn/.Google Scholar
- [44] . 2014. Recurrent neural network regularization. CoRR abs/1409.2329 (2014). arXiv:1409.2329.Google Scholar
- [45] . 2018. DeepCPU: Serving RNN-based deep learning models 10× faster. In 2018 USENIX Annual Technical Conference. 951–965. Google Scholar
- [46] . 2019. Learning to ask unanswerable questions for machine reading comprehension. CoRR abs/1906.06045 (2019).Google Scholar
Index Terms
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks
Recommendations
Robust Adaptive Gradient-Descent Training Algorithm for Recurrent Neural Networks in Discrete Time Domain
For a recurrent neural network (RNN), its transient response is a critical issue, especially for real-time signal processing applications. The conventional RNN training algorithms, such as backpropagation through time (BPTT) and real-time recurrent ...
Recursive Bayesian recurrent neural networks for time-series modeling
This paper develops a probabilistic approach to recursive second-order training of recurrent neural networks (RNNs) for improved time-series modeling. A general recursive Bayesian Levenberg-Marquardt algorithm is derived to sequentially update the ...
Non-stationary Multivariate Time Series Prediction with Selective Recurrent Neural Networks
PRICAI 2019: Trends in Artificial IntelligenceAbstractNon-stationary multivariate time series (NSMTS) prediction is still a challenging issue nowadays. Methods based on deep learning, especially Long Short-Term Memory and Gated Recurrent Unit neural networks (LSTMs and GRUs) have achieved state-of-...






Comments