skip to main content
research-article

When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

Authors Info & Claims
Published:09 November 2021Publication History
Skip Abstract Section

Abstract

Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension. MD-LSTM achieves state-of-the-art results in various applications, including handwritten text recognition, medical imaging, and many more. However, its implementation suffers from the inherently sequential execution that tremendously slows down both training and inference compared to other neural networks.

The main goal of the current research is to provide acceleration for inference of MD-LSTM. We advocate that Field-Programmable Gate Array (FPGA) is an alternative platform for deep learning that can offer a solution when the massive parallelism of GPUs does not provide the necessary performance required by the application.

In this article, we present the first hardware architecture for MD-LSTM. We conduct a systematic exploration to analyze a tradeoff between precision and accuracy. We use a challenging dataset for semantic segmentation, namely historical document image binarization from the DIBCO 2017 contest and a well-known MNIST dataset for handwritten digit recognition. Based on our new architecture, we implement FPGA-based accelerators that outperform Nvidia Geforce RTX 2080 Ti with respect to throughput by up to 9.9 and Nvidia Jetson AGX Xavier with respect to energy efficiency by up to 48. Our accelerators achieve higher throughput, energy efficiency, and resource efficiency than FPGA-based implementations of convolutional neural networks (CNNs) for semantic segmentation tasks. For the handwritten digit recognition task, our FPGA implementations provide higher accuracy and can be considered as a solution when accuracy is a priority. Furthermore, they outperform earlier FPGA implementations of one-dimensional LSTMs with respect to throughput, energy efficiency, and resource efficiency.

REFERENCES

  1. [1] Chan William, Jaitly Navdeep, Le Quoc, and Vinyals Oriol. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 49604964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Vinyals Oriol, Toshev Alexander, Bengio Samy, and Erhan Dumitru. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 31563164.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Graves Alex, Fernández Santiago, and Schmidhuber Jürgen. 2007. Multi-dimensional recurrent neural networks. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 549558.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Voigtlaender Paul, Doetsch Patrick, and Ney Hermann. 2016. Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR’16). IEEE, 228233.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Moysset Bastien and Messina Ronaldo. 2019. Are 2d-lstm really dead for offline text recognition? International Journal on Document Analysis and Recognition (IJDAR), 22, 3 (2019), 193208.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Byeon Wonmin, Breuel Thomas M., Raue Federico, and Liwicki Marcus. 2015. Scene labeling with lstm recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 35473555.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Stollenga Marijn F., Byeon Wonmin, Liwicki Marcus, and Schmidhuber Juergen. 2015. Parallel multi-dimensional lstm, with application to fast biomedical volumetric image segmentation. In Advances in Neural Information Processing Systems. 29983006.Google ScholarGoogle Scholar
  8. [8] Davidson Benjamin, Kalitzeos Angelos, Carroll Joseph, Dubra Alfredo, Ourselin Sebastien, Michaelides Michel, and Bergeles Christos. 2018. Automatic cone photoreceptor localisation in healthy and Stargardt afflicted retinas using deep learning. Sci. Rep. 8, 1 (2018), 7911.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Puigcerver Joan. 2017. Are multidimensional recurrent layers really necessary for handwritten text recognition?. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR’17), Vol. 1. IEEE, 6772.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Wenniger Gideon Maillette de Buy, Schomaker Lambert, and Way Andy. 2019. No padding please: Efficient neural handwriting recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 355362.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Pratikakis Ioannis, Zagoris Konstantinos, Barlas George, and Gatos Basilis. 2017. ICDAR2017 competition on document image binarization (DIBCO 2017). In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR’17), Vol. 1. IEEE, 13951403.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] LeCun Yann, Bottou Léon, Bengio Yoshua, Haffner Patrick, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 22782324.Google ScholarGoogle Scholar
  13. [13] Zhao Hengshuang, Shi Jianping, Qi Xiaojuan, Wang Xiaogang, and Jia Jiaya. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28812890.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Yi Yaning, Zhang Zhijie, Zhang Wanchang, Zhang Chuanrong, Li Weidong, and Zhao Tian. 2019. Semantic segmentation of urban buildings from vhr remote sensing imagery using a deep convolutional neural network. Remote Sens. 11, 15 (2019), 1774.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234241.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Grüning Tobias, Leifert Gundram, Strauß Tobias, Michael Johannes, and Labahn Roger. 2019. A two-stage method for text line detection in historical documents. Int. J. Doc. Anal. Recogn. 22, 3 (2019), 285302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Liu Shuanglong, Fan Hongxiang, Niu Xinyu, Ng Ho-cheung, Chu Yang, and Luk Wayne. 2018. Optimizing cnn-based segmentation with deeply customized convolutional and deconvolutional architectures on fpga. ACM Trans. Reconfig. Technol. Syst. 11, 3 (2018), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Liu Shuanglong and Luk Wayne. 2019. Towards an efficient accelerator for DNN-based remote sensing image segmentation on FPGAs. In Proceedings of the 29th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, 187193.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Shimoda Masayuki, Sada Youki, and Nakahara Hiroki. 2019. Filter-wise pruning approach to FPGA implementation of fully convolutional network for semantic segmentation. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 371386.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Fang Shaoxia, Tian Lu, Wang Junbin, Liang Shuang, Xie Dongliang, Chen Zhongmin, Sui Lingzhi, Yu Qian, Sun Xiaoming, Shan Yi, et al. 2018. Real-time object detection and semantic segmentation hardware system with deep learning networks. In Proceedings of the International Conference on Field-Programmable Technology (FPT’18). IEEE, 389392.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Afzal Muhammad Zeshan, Pastor-Pellicer Joan, Shafait Faisal, Breuel Thomas M, Dengel Andreas, and Liwicki Marcus. 2015. Document image binarization using lstm: A sequence learning approach. In Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing. ACM, 7984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Bezmaternykh Pavel Vladimirovich, Ilin Dmitrii Alexeevich, and Nikolaev Dmitry Petrovich. 2019. U-net-bin: Hacking the document image binarization contest. Comput. Opt. 43, 5 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Xia Lixue, Diao Lansong, Jiang Zhao, Liang Hao, Chen Kai, Ding Li, Dou Shunli, Su Zibin, Sun Meng, Zhang Jiansong, et al. 2019. PAI-FCNN: FPGA based inference system for complex CNN models. In Proceedings of the IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP’19), Vol. 2160. IEEE, 107114.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Cordts Marius, Omran Mohamed, Ramos Sebastian, Rehfeld Timo, Enzweiler Markus, Benenson Rodrigo, Franke Uwe, Roth Stefan, and Schiele Bernt. 2016. The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 32133223.Google ScholarGoogle Scholar
  25. [25] Brostow Gabriel J., Fauqueur Julien, and Cipolla Roberto. 2008. Semantic object classes in video: A high-definition ground truth database. Pattern Recogn. Lett. 30, 2 (2008), 8897.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Maška Martin, Ulman Vladimír, Svoboda David, Matula Pavel, Matula Petr, Ederra Cristina, Urbiola Ainhoa, España Tomás, Venkatesan Subramanian, Balak Deepak M. W., et al. 2014. A benchmark for comparison of cell tracking algorithms. Bioinformatics 30, 11 (2014), 16091617.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Huang Xiao, Li Lin, Liu Rong, Xu Chengshen, and Ye Mingdeng. 2020. Binarization of degraded document images with global-local U-Nets. Optik 203 (2020), 164025.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Wang Jingdong, Sun Ke, Cheng Tianheng, Jiang Borui, Deng Chaorui, Zhao Yang, Liu Dong, Mu Yadong, Tan Mingkui, Wang Xinggang, et al. 2020. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google ScholarGoogle Scholar
  29. [29] Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34313440.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Lin Tsung-Yi, Dollár Piotr, Girshick Ross, He Kaiming, Hariharan Bharath, and Belongie Serge. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21172125.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Zhao Hengshuang, Qi Xiaojuan, Shen Xiaoyong, Shi Jianping, and Jia Jiaya. 2018. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 405420.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Kalchbrenner Nal, Danihelka Ivo, and Graves Alex. 2015. Grid long short-term memory. arXiv:1507.01526. Retrieved from https://arxiv.org/abs/1507.01526.Google ScholarGoogle Scholar
  33. [33] Leifert Gundram, Strauß Tobias, Grüning Tobias, Wustlich Welf, and Labahn Roger. 2016. Cells in multidimensional recurrent neural networks. J. Mach. Learn. Res. 17, 1 (2016), 33133349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Oord Aaron van den, Kalchbrenner Nal, and Kavukcuoglu Koray. 2016. Pixel recurrent neural networks. In International Conference on Machine Learning. PMLR, 17471756.Google ScholarGoogle Scholar
  35. [35] Pratikakis Ioannis, Zagoris Konstantinos, Kaddas Panagiotis, and Gatos Basilios. 2018. ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018). In Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR’18), 489493.Google ScholarGoogle Scholar
  36. [36] READ-COOP SCE: Revolutionizing Access to Handwritten Documents. Retrieved June 16, 2021 from https://readcoop.eu. ([n. d.]).Google ScholarGoogle Scholar
  37. [37] Karpinski Romain and Belaïd Abdel. 2018. Combination of two fully convolutional neural networks for robust binarization. In Proceedings of the Asian Conference on Computer Vision. Springer, 509524.Google ScholarGoogle Scholar
  38. [38] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] PG338—Zynq DPU v3.1 IP Product Guide (v3.1). Retrieved June 16, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/dpu/v3_1/pg338-dpu.pdf.Google ScholarGoogle Scholar
  41. [41] Rybalkin Vladimir and Wehn Norbert. 2020. When massive GPU parallelism ain’t enough: A novel hardware architecture of 2d-LSTM neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). Association for Computing Machinery, New York, NY, 111121. DOI: DOI: http://dx.doi.org/10.1145/3373087.3375301Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Park Jinhwan and Sung Wonyong. 2016. FPGA based implementation of deep neural networks using on-chip memory only. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 10111015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Alemdar Hande, Leroy Vincent, Prost-Boucle Adrien, and Pétrot Frédéric. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’17). IEEE, 25472554.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Liang Shuang, Yin Shouyi, Liu Leibo, Luk Wayne, and Wei Shaojun. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 10721086.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Rastegari Mohammad, Ordonez Vicente, Redmon Joseph, and Farhadi Ali. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525542.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Umuroglu Yaman, Fraser Nicholas J., Gambardella Giulio, Blott Michaela, Leong Philip, Jahre Magnus, and Vissers Kees. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 6574.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Chang Andre Xian Ming, Martini Berin, and Culurciello Eugenio. 2015. Recurrent neural networks hardware implementation on FPGA. arXiv:1511.05552. Retrieved from https://arxiv.org/abs/1511.05552.Google ScholarGoogle Scholar
  48. [48] Ferreira Joao Canas and Fonseca Jose. 2016. An FPGA implementation of a long short-term memory neural network. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’16). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Lee Minjae, Hwang Kyuyeon, Park Jinhwan, Choi Sungwook, Shin Sungho, and Sung Wonyong. 2016. FPGA-based low-power speech recognition with recurrent neural networks. In Proceedings of the IEEE International Workshop on Signal Processing Systems (SiPS’16). IEEE, 230235.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Guan Yijin, Yuan Zhihang, Sun Guangyu, and Cong Jason. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17). IEEE, 629634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhang Weifeng, Ge Fen, Cui Chenchen, Yang Ying, Zhou Fang, and Wu Ning. 2020. Design and implementation of LSTM accelerator based on FPGA. In Proceedings of the IEEE 20th International Conference on Communication Technology (ICCT’20). IEEE, 16751679.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhang Xinyi, Jiang Weiwen, and Hu Jingtong. 2020. Achieving full parallelism in LSTM via a unified accelerator design. In Proceedings of the IEEE 38th International Conference on Computer Design (ICCD’20). IEEE, 469477.Google ScholarGoogle Scholar
  53. [53] Guan Yijin, Liang Hao, Xu Ningyi, Wang Wenqiang, Shi Shaoshuai, Chen Xi, Sun Guangyu, Zhang Wei, and Cong Jason. 2017. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). IEEE, 152159.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Que Zhiqiang, Zhu Yongxin, Fan Hongxiang, Meng Jiuxi, Niu Xinyu, and Luk Wayne. 2020. Mapping large LSTMs to FPGAs with weight reuse. J. Sign. Process. Syst. 92, 9 (2020), 965979.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Rybalkin Vladimir, Wehn Norbert, Yousefi Mohammad Reza, and Stricker Didier. 2017. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 13941399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Rybalkin Vladimir, Pappalardo Alessandro, Ghaffar Muhammad Mohsin, Gambardella Giulio, Wehn Norbert, and Blott Michaela. 2018. FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 89897.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Zhang Xiaofan, Liu Xinheng, Ramachandran Anand, Zhuge Chuanhao, Tang Shibin, Ouyang Peng, Cheng Zuofu, Rupnow Kyle, and Chen Deming. 2017. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Han Song, Kang Junlong, Mao Huizi, Hu Yiming, Li Xin, Li Yubin, Xie Dongliang, Luo Hong, Yao Song, Wang Yu, et al. 2017. Ese: Efficient speech recognition rngine with sparse LSTM on FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 7584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Wang Shuo, Li Zhe, Ding Caiwen, Yuan Bo, Qiu Qinru, Wang Yanzhi, and Liang Yun. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 1120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Nan Guocai, Wang Chenghua, Liu Weiqiang, and Lombardi Fabrizio. 2020. DC-LSTM: Deep compressed LSTM with low bit-width and structured matrices. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’20). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Cao Shijie, Zhang Chen, Yao Zhuliang, Xiao Wencong, Nie Lanshun, Zhan Dechen, Liu Yunxin, Wu Ming, and Zhang Lintao. 2019. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 6372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Ghasemzadeh Seyed Abolfazl, Tavakoli Erfan Bank, Kamal Mehdi, Afzali-Kusha Ali, and Pedram Massoud. 2021. BRDS: An FPGA-based LSTM accelerator with row-balanced dual-ratio sparsification. arXiv:2101.02667. Retrieved from https://arxiv.org/abs/2101.02667.Google ScholarGoogle Scholar
  63. [63] PyTorch—An Open Source Machine Leraning Framework. Retrieved June 16, 2021 from http://pytorch.org/.Google ScholarGoogle Scholar
  64. [64] Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 68696898.Google ScholarGoogle Scholar
  65. [65] Zhou Shuchang, Wu Yuxin, Ni Zekun, Zhou Xinyu, Wen He, and Zou Yuheng. 2016. DoReFa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160. Retrieved from https://arxiv.org/abs/1606.06160.Google ScholarGoogle Scholar
  66. [66] Glorot Xavier and Bengio Yoshua. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249256.Google ScholarGoogle Scholar
  67. [67] DMA/Bridge Subsystem for PCI Express v4.1. Retrieved June 16, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/xdma/v4_1/pg195-pcie-dma.pdf.Google ScholarGoogle Scholar
  68. [68] UltraScale Architecture-Based FPGAs Memory IP v1.4. Retrieved June 16, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/ultrascale_memory_ip/v1_4/pg150-ultrascale-memory-ip.pdf.Google ScholarGoogle Scholar
  69. [69] Zynq UltraScale+ MPSoC Power Advantage Tool. Retrieved June 16, 2021 from https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841813/Zynq+UltraScale+MPSoC+Power+Management.Google ScholarGoogle Scholar
  70. [70] Kowsari Kamran, Heidarysafa Mojtaba, Brown Donald E., Meimandi Kiana Jafari, and Barnes Laura E.. 2018. Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd International Conference on Information System and Data Mining. ACM, 1928.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Bukhari Syed Saqib, Kadi Ahmad, Jouneh Mohammad Ayman, Mir Fahim Mahmood, and Dengel Andreas. 2017. anyocr: An open-source ocr system for historical archives. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR’17), Vol. 1. IEEE, 305310.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Afzal M. Z., Kramer M., Bukhari Syed Saqib, Yousefi M. R., Shafait Faisal, and Breuel T. M.. 2014. Robust Binarization of Stereo and Monocular Document Images Using Percentile Filter. Vol. 1. Springer, 139149. DOI: DOI: http://dx.doi.org/10.1007/978-3-319-05167-3_11Google ScholarGoogle Scholar
  73. [73] Rybalkin Vladimir, Bukhari Syed Saqib, Ghaffar Muhammad Mohsin, Ghafoor Aqib, Wehn Norbert, and Dengel Andreas. 2018. iDocChip: A configurable hardware architecture for historical document image processing: Percentile based binarization. In Proceedings of the ACM Symposium on Document Engineering 2018. 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Narrenschif. Retrieved June 16, 2021 from http://kallimachos.de/kallimachos/index.php/Narragonien.Google ScholarGoogle Scholar
  75. [75] Digital Multimeter Vilcraft. Retrieved June 16, 2021 from http://www.produktinfo.conrad.com/datenblaetter/100000-124999/124608-an-01-ml-VOLTCRAFT_VC_870_DMM__K__de_en_fr_nl.pdf.Google ScholarGoogle Scholar
  76. [76] Hao Yufeng and Quigley Steven. 2017. The implementation of a deep recurrent neural network language model on a Xilinx FPGA. arXiv:1710.10296. Retrieved from https://arxiv.org/abs/1710.10296.Google ScholarGoogle Scholar
  77. [77] Jung Matthias, McKee Sally A., Sudarshan Chirag, Dropmann Christoph, Weis Christian, and Wehn Norbert. 2018. Driving into the memory wall: The role of memory for advanced driver assistance systems and autonomous driving. In Proceedings of the International Symposium on Memory Systems (MEMSYS’18). Association for Computing Machinery, New York, NY, 377386. DOI: DOI: http://dx.doi.org/10.1145/3240302.3240322Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Reconfigurable Technology and Systems
              ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 1
              March 2022
              262 pages
              ISSN:1936-7406
              EISSN:1936-7414
              DOI:10.1145/3494949
              • Editor:
              • Deming Chen
              Issue’s Table of Contents

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 November 2021
              • Accepted: 1 June 2021
              • Revised: 1 April 2021
              • Received: 1 December 2020
              Published in trets Volume 15, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!