Abstract
The language model is a widely used component in fields such as natural language processing, automatic speech recognition, and optical character recognition. In particular, statistical machine translation uses language models, and the translation speed and the amount of memory required are greatly affected by the performance of the language model implementation.
We propose a fast and compact implementation of n-gram language models that increases query speed and reduces memory usage by using a double-array structure, which is known to be a fast and compact trie data structure. We propose two types of implementation: one for backward suffix trees and the other for reverse tries. The data structure is optimized for space efficiency by embedding model parameters into otherwise unused spaces in the double-array structure.
We show that the reverse trie version of our method is among the smallest state-of-the-art implementations in terms of model size with almost the same speed as the implementation that performs fastest on perplexity calculation tasks. Similarly, we achieve faster decoding while keeping compact model sizes, and we confirm that our method can utilize the efficiency of the double-array structure to achieve a balance between speed and size on translation tasks.
- JunIchi Aoe. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering 15, 9, 1066--1077. Google Scholar
Digital Library
- Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger. 2009. Hash, displace, and compress. In Proceedings of the 17th European Symposium on Algorithms. 682--693.Google Scholar
Cross Ref
- Timothy C. Bell, John G. Cleary, and Ian H. Witten. 1990. Text Compression. Prentice Hall. Google Scholar
Digital Library
- Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 858--867.Google Scholar
- David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2, 201--228. Google Scholar
Digital Library
- Philip Clarkson and Donald Rosenfeld. 1997. Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech’97). 2707—2710.Google Scholar
- Jacob Devlin, Chris Quirk, and Arul Menezes. 2015. Pre-computable multi-layer neural network language models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 256--260.Google Scholar
Cross Ref
- Marcello Federico, Nicola Bertoldi, and Mauro Cettolo. 2008. IRSTLM: An open source toolkit for handling large scale language models. In Proceedings of the 9th Annual Conference of the International Speech Communication Association (Interspeech’08). 1--4.Google Scholar
- Edward Fredkin. 1960. Trie memory. Communications of the ACM 3, 9, 490--499. Google Scholar
Digital Library
- Kimmo Fredriksson and Fedor Nikitin. 2007. Simple compression code supporting random access and fast string matching. In Proceedings of the 6th International Conference on Experimental Algorithms (Workshop on Experimental Algorithms’07). 203--216. Google Scholar
Digital Library
- Ulrich Germann, Eric Joanis, and Samuel Larkin. 2009. Tightly packed tries: How to fit large models into memory, and make them load fast, too. In Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing. 31--39. Google Scholar
Digital Library
- Isao Goto, Ka Pa Chow, Bin Lu, Eiichiro Sumita, and Benjamin K. Tsou. 2013. Overview of the patent machine translation task at the NTCIR-10 workshop. In Proceedings of the 8th Annual Informatics Spring Research Conference (NTCIR’13). 260--286.Google Scholar
- David Guthrie and Mark Hepple. 2010. Storing the Web in memory: Space efficient language models with constant time retrieval. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 262--272. Google Scholar
Digital Library
- Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the 6th Workshop on Statistical Machine Translation. 187--197. Google Scholar
Digital Library
- Kenneth Heafield, Hieu Hoang, Philipp Koehn, Tetsuo Kiso, and Marcello Federico. 2011. Left language model state for syntactic machine translation. In Proceedings of the International Workshop on Spoken Language Translation. 183--190.Google Scholar
- Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 690--696.Google Scholar
- Guy Jacobson. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE, Los Alamitos, CA, 549--554. Google Scholar
Digital Library
- Frederick Jelinek. 1990. Self-organized language modeling for speech recognition. In Readings in Speech Recognition. Morgan Kaufmann Publishers, 450--506. Google Scholar
Digital Library
- Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing 35, 3, 400--401.Google Scholar
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 177--180. Google Scholar
Digital Library
- Zhifei Li and Sanjeev Khudanpur. 2008. A scalable decoder for parsing-based machine translation with equivalent language model state maintenance. In Proceedings of the HLT 2nd Workshop on Syntax and Structure in Statistical Translation (SSST-2). 10--18. Google Scholar
Digital Library
- Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He. 2011. Compression methods by code mapping and code dividing for Chinese dictionary stored in a double-array trie. In Proceedings of the 5th International Joint Conference on Natural Language Processing. 1189--1197.Google Scholar
- Yasumasa Nakamura and Hisatoshi Mochizuki. 2006. Fast computation of updating method of a dictionary for compression digital search tree. Information Processing Society of Japan 47, 13, 16--27.Google Scholar
- Adam Pauls and Dan Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 258--267. Google Scholar
Digital Library
- Bhiksha Raj and Edward W. D. Whittaker. 2003. Lossless compression of language model structure and word identifiers. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. I-388--I-391.Google Scholar
- Hidemi Shigekoshi, Takuma Kuramitsu, and Hisatoshi Mochizuki. 2009. Managing transitions of double-array for fast insertion and deletion. In Proceedings of the Forum on Information Technology. 1--6.Google Scholar
- Jeffrey Sorensen and Cyril Allauzen. 2011. Unary data structures for language models. In Proceedings of the 12th International Speech Communication Association Annual Conference (Interspeech’11). 2--5.Google Scholar
- A. Stolcke. 2002. SRILM—an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing.Google Scholar
- David Talbot and Thorsten Brants. 2008. Randomized language models via perfect hash functions. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT).Google Scholar
- Taro Watanabe, Hajime Tsukada, and Hideki Isozaki. 2009. A succinct n-gram language model. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. 341--344. Google Scholar
Digital Library
- Makoto Yasuhara, Toru Tanaka, Jun-Ya Norimatsu, and Mikio Yamamoto. 2013. An efficient language model using double-array structures. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 222--232.Google Scholar
- Susumu Yata, Masahiro Tamura, Kazuhiro Morita, Masao Fuketa, and JunIchi Aoe. 2009. Sequential insertions and performance evaluations for double-arrays. In Proceedings of the 71st National Convention of the Information Processing Society of Japan. 263--264.Google Scholar
- Naoki Yoshinaga and Masaru Kitsuregawa. 2014. A self-adaptive classifier for efficient text-stream processing. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING’14). 1091--1102.Google Scholar
Index Terms
A Fast and Compact Language Model Implementation Using Double-Array Structures
Recommendations
A compression method of double-array structures using linear functions
A trie is one of the data structures for keyword search algorithms and is utilized in natural language processing, reserved words search for compilers and so on. The double-array and LOUDS are efficient representation methods for the trie. The double-...
Using Sublexical Translations to Handle the OOV Problem in Machine Translation
We introduce a method for learning to translate out-of-vocabulary (OOV) words. The method focuses on combining sublexical/constituent translations of an OOV to generate its translation candidates. In our approach, wildcard searches are formulated based ...
Compression of double array structures for fixed length keywords
A trie is one of the data structures for keyword matching. It is used in natural language processing, IP address routing, and so on. It is represented by the matrix form, the link form, the double array, and LOUDS. The double array representation ...






Comments