Abstract
Existing supervised solutions for Named Entity Recognition (NER) typically rely on a large annotated corpus. Collecting large amounts of NER annotated corpus is time-consuming and requires considerable human effort. However, collecting small amounts of annotated corpus for any language is feasible, but the performance degrades due to data sparsity. We address the data sparsity by borrowing features from the data of a closely related language. We use hierarchical neural networks to train a supervised NER system. The feature borrowing from a closely related language happens via the shared layers of the network. The neural network is trained on the combined dataset of the low-resource language and a closely related language, also termed Multilingual Learning. Unlike existing systems, we share all layers of the network between the two languages. We apply multilingual learning for NER in Indian languages and empirically show the benefits over a monolingual deep learning system and a traditional machine-learning system with some feature engineering. Using multilingual learning, we show that the low-resource language NER performance increases mainly due to (1) increased named entity vocabulary, (2) cross-lingual subword features, and (3) multilingual learning playing the role of regularization.
- Pushpak Bhattacharyya. 2010. IndoWordNet. In Lexical Resources Engineering Conference 2010 (LREC'10).Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146. https://transacl.org/ojs/index.php/tacl/article/view/999.Google Scholar
Cross Ref
- Yufeng Chen, Chengqing Zong, and Keh-Yih Su. 2013. A joint model to identify and align bilingual named entities. Computational Linguistics 39, 2 (June 2013), 229--266. Google Scholar
Digital Library
- Jason Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics 4 (2016), 357--370. https://transacl.org/ojs/index.php/tacl/article/view/792.Google Scholar
Cross Ref
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. Retrieved from http://dl.acm.org/citation.cfm?id=1953048.2078186. Google Scholar
Digital Library
- Paramveer S. Dhillon, Dean P. Foster, and Lyle H. Ungar. 2015. Eigenwords: Spectral word embeddings. Journal of Machine Learning Research 16 (2015), 3035--3078. Retrieved from http://jmlr.org/papers/v16/dhillon15a.html. Google Scholar
Digital Library
- Cícero dos Santos and Victor Guimarães. 2015. Boosting named entity recognition with neural character embeddings. In Proceedings of NEWS 2015 the Fifth Named Entity Workshop. Association for Computational Linguistics, 25--33.Google Scholar
Cross Ref
- Cícero Nogueira Dos Santos, and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, II--1818--II--1826. Retrieved from http://dl.acm.org/citation.cfm?id=3044805.3045095. Google Scholar
Digital Library
- Manaal Faruqui. 2014. “Translation can’t change a name”: Using multilingual data for named entity recognition. abs/1405.0701 (2014). arxiv:1405.0701. Retrieved from http://arxiv.org/abs/1405.0701.Google Scholar
- Manaal Faruqui and Sebastian Padó. 2010. Training and evaluating a German named entity recognizer with semantic generalization. In Proceedings of KONVENS 2010. Saarbrücken, Germany.Google Scholar
- Orhan Firat, Kyunghyun Cho, Baskaran Sankaran, Fatos T. Yarman Vural, and Yoshua Bengio. 2017. Multi-way, multilingual neural machine translation. Computer Speech and Language 45, C (Sept. 2017), 236--252. Google Scholar
Digital Library
- Dan Gillick, Cliff Brunk, Oriol Vinyals, and Amarnag Subramanya. 2016. Multilingual language processing from bytes. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1296--1306.Google Scholar
Cross Ref
- Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 748--756. Google Scholar
Digital Library
- James Hammerton. 2003. Named entity recognition with long short-term memory. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. Association for Computational Linguistics. Google Scholar
Digital Library
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. abs/1508.01991 (2015). arxiv:1508.01991. Retrieved from http://arxiv.org/abs/1508.01991.Google Scholar
- Girish Nath Jha. 2012. The TDIL program and the Indian language corpora initiative. In Language Resources and Evaluation Conference.Google Scholar
- Melvin Johnson, Mike Schuster, Quoc Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda ViÃl’gas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339--351. https://transacl.org/ojs/index.php/tacl/article/view/1081.Google Scholar
Cross Ref
- Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Conference Proceedings: The 10th Machine Translation Summit. AAMT, 79--86.Google Scholar
- Michal Konkol and Miloslav Konopík. 2013. CRF-based Czech named entity recognizer and consolidation of Czech NER research. In Text, Speech, and Dialogue, Ivan Habernal and Václav Matoušek (Eds.). Springer, Berlin, 153--160.Google Scholar
- Anoop Kunchukuttan and Pushpak Bhattacharyya. 2016. Orthographic syllable as basic unit for SMT between related languages. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1912--1917.Google Scholar
Cross Ref
- Anoop Kunchukuttan, Ratish Puduppully, and Pushpak Bhattacharyya. 2015. Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 81--85.Google Scholar
Cross Ref
- Sobha Lalitha Devi, Pattabhi R. K. Rao, C. S. Malarkodi, and R. Vijay Sundar Ram. 2014. Indian language NER annotated FIRE 2014 corpus (FIRE 2014 NER corpus). In Named-Entity Recognition Indian Languages FIRE 2014 Evaluation Track.Google Scholar
- Guillaume Lample, Miguel Ballesteros, Kazuya Kawakami, Sandeep Subramanian, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260--270.Google Scholar
Cross Ref
- Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Chris Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal 6, 2 (2015), 167--195.Google Scholar
- Qi Li, Haibo Li, Heng Ji, Wen Wang, Jing Zheng, and Fei Huang. 2012. Joint bilingual name tagging for parallel corpora. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1727--1731. Google Scholar
Digital Library
- Liyuan Liu, Jingbo Shang, Xiang Ren, Frank Fangzheng Xu, Huan Gui, Jian Peng, and Jiawei Han. 2018. Empower sequence labeling with task-aware neural language model. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17123.Google Scholar
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1064--1074.Google Scholar
Cross Ref
- Rudra V. Murthy and Pushpak Bhattacharyya. 2018. A deep learning solution to named entity recognition. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Springer International Publishing, Cham, 427--438.Google Scholar
- Matthew Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. 2017. Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1756--1765.Google Scholar
Cross Ref
- Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL’09). Association for Computational Linguistics, Stroudsburg, PA, 147--155. Retrieved from http://dl.acm.org/citation.cfm?id=1596374.1596399. Google Scholar
Digital Library
- Marek Rei. 2017. Semi-supervised multitask learning for sequence labeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2121--2130.Google Scholar
Cross Ref
- M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. Transactions on Signal Processing 45, 11 (Nov. 1997), 2673--2681. Google Scholar
Digital Library
- Anil Kumar Singh. 2008. Named Entity Recognition for South and South East Asian Languages: Taking Stock.Google Scholar
- Raivis Skadiņš, Jörg Tiedemann, Roberts Rozis, and Daiga Deksne. 2014. Billions of parallel words for free: Building and using the EU bookshop corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland.Google Scholar
- Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In International Conference on Learning Representations.Google Scholar
- Erik F. Tjong Kim Sang. 2002. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning - Volume 20 (COLING’02). Association for Computational Linguistics, Stroudsburg, PA, 1--4. Google Scholar
Digital Library
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4 (CONLL’03). Association for Computational Linguistics. Google Scholar
Digital Library
- Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). Association for Computational Linguistics. 384--394. Retrieved from http://dl.acm.org/citation.cfm?id=1858681.1858721. Google Scholar
Digital Library
- Kãrumũri V. Subbãrão. 2012. South Asian languages: A syntactic typology. Cambridge University Press.Google Scholar
- L. J. P. van der Maaten and G. E. Hinton. 2008. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605.Google Scholar
- Mengqiu Wang, Wanxiang Che, and Christopher D. Manning. 2013a. Effective bilingual constraints for semi-supervised learning of named entity recognizers. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 919--925. Retrieved from http://dl.acm.org/citation.cfm?id=2891460.2891588. Google Scholar
Digital Library
- Mengqiu Wang, Wanxiang Che, and Christopher D. Manning. 2013b. Joint word alignment and bilingual named entity recognition using dual decomposition. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Volume 1: Long Papers. Association for Computational Linguistics, 1073--1082. http://www.aclweb.org/anthology/P13-1106.Google Scholar
- Mengqiu Wang and Christopher D. Manning. 2014. Cross-lingual projected expectation regularization for weakly supervised learning. Transactions of the Association for Computational Linguistics 2 (2014), 55--66. https://transacl.org/ojs/index.php/tacl/article/view/197.Google Scholar
Cross Ref
- Zhilin Yang, Ruslan Salakhutdinov, and William Cohen. 2017. Multi-task cross-lingual sequence tagging from scratch. In International Conference on Learning Representations.Google Scholar
Index Terms
Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning
Recommendations
A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such ...
Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
Named entity recognition in the Indonesian language has significantly developed in recent years. However, it still lacks standardized publicly available corpora; a small dataset is available but suffers from inconsistent annotations. Therefore, we re-...
Multilingual Offensive Language Identification for Low-resource Languages
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, ...






Comments