skip to main content
research-article

Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning

Authors Info & Claims
Published:14 December 2018Publication History
Skip Abstract Section

Abstract

Existing supervised solutions for Named Entity Recognition (NER) typically rely on a large annotated corpus. Collecting large amounts of NER annotated corpus is time-consuming and requires considerable human effort. However, collecting small amounts of annotated corpus for any language is feasible, but the performance degrades due to data sparsity. We address the data sparsity by borrowing features from the data of a closely related language. We use hierarchical neural networks to train a supervised NER system. The feature borrowing from a closely related language happens via the shared layers of the network. The neural network is trained on the combined dataset of the low-resource language and a closely related language, also termed Multilingual Learning. Unlike existing systems, we share all layers of the network between the two languages. We apply multilingual learning for NER in Indian languages and empirically show the benefits over a monolingual deep learning system and a traditional machine-learning system with some feature engineering. Using multilingual learning, we show that the low-resource language NER performance increases mainly due to (1) increased named entity vocabulary, (2) cross-lingual subword features, and (3) multilingual learning playing the role of regularization.

References

  1. Pushpak Bhattacharyya. 2010. IndoWordNet. In Lexical Resources Engineering Conference 2010 (LREC'10).Google ScholarGoogle Scholar
  2. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146. https://transacl.org/ojs/index.php/tacl/article/view/999.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yufeng Chen, Chengqing Zong, and Keh-Yih Su. 2013. A joint model to identify and align bilingual named entities. Computational Linguistics 39, 2 (June 2013), 229--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jason Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics 4 (2016), 357--370. https://transacl.org/ojs/index.php/tacl/article/view/792.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. Retrieved from http://dl.acm.org/citation.cfm?id=1953048.2078186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paramveer S. Dhillon, Dean P. Foster, and Lyle H. Ungar. 2015. Eigenwords: Spectral word embeddings. Journal of Machine Learning Research 16 (2015), 3035--3078. Retrieved from http://jmlr.org/papers/v16/dhillon15a.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cícero dos Santos and Victor Guimarães. 2015. Boosting named entity recognition with neural character embeddings. In Proceedings of NEWS 2015 the Fifth Named Entity Workshop. Association for Computational Linguistics, 25--33.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cícero Nogueira Dos Santos, and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, II--1818--II--1826. Retrieved from http://dl.acm.org/citation.cfm?id=3044805.3045095. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Manaal Faruqui. 2014. “Translation can’t change a name”: Using multilingual data for named entity recognition. abs/1405.0701 (2014). arxiv:1405.0701. Retrieved from http://arxiv.org/abs/1405.0701.Google ScholarGoogle Scholar
  10. Manaal Faruqui and Sebastian Padó. 2010. Training and evaluating a German named entity recognizer with semantic generalization. In Proceedings of KONVENS 2010. Saarbrücken, Germany.Google ScholarGoogle Scholar
  11. Orhan Firat, Kyunghyun Cho, Baskaran Sankaran, Fatos T. Yarman Vural, and Yoshua Bengio. 2017. Multi-way, multilingual neural machine translation. Computer Speech and Language 45, C (Sept. 2017), 236--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dan Gillick, Cliff Brunk, Oriol Vinyals, and Amarnag Subramanya. 2016. Multilingual language processing from bytes. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1296--1306.Google ScholarGoogle ScholarCross RefCross Ref
  13. Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 748--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. James Hammerton. 2003. Named entity recognition with long short-term memory. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. abs/1508.01991 (2015). arxiv:1508.01991. Retrieved from http://arxiv.org/abs/1508.01991.Google ScholarGoogle Scholar
  16. Girish Nath Jha. 2012. The TDIL program and the Indian language corpora initiative. In Language Resources and Evaluation Conference.Google ScholarGoogle Scholar
  17. Melvin Johnson, Mike Schuster, Quoc Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda ViÃl’gas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339--351. https://transacl.org/ojs/index.php/tacl/article/view/1081.Google ScholarGoogle ScholarCross RefCross Ref
  18. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Conference Proceedings: The 10th Machine Translation Summit. AAMT, 79--86.Google ScholarGoogle Scholar
  19. Michal Konkol and Miloslav Konopík. 2013. CRF-based Czech named entity recognizer and consolidation of Czech NER research. In Text, Speech, and Dialogue, Ivan Habernal and Václav Matoušek (Eds.). Springer, Berlin, 153--160.Google ScholarGoogle Scholar
  20. Anoop Kunchukuttan and Pushpak Bhattacharyya. 2016. Orthographic syllable as basic unit for SMT between related languages. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1912--1917.Google ScholarGoogle ScholarCross RefCross Ref
  21. Anoop Kunchukuttan, Ratish Puduppully, and Pushpak Bhattacharyya. 2015. Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 81--85.Google ScholarGoogle ScholarCross RefCross Ref
  22. Sobha Lalitha Devi, Pattabhi R. K. Rao, C. S. Malarkodi, and R. Vijay Sundar Ram. 2014. Indian language NER annotated FIRE 2014 corpus (FIRE 2014 NER corpus). In Named-Entity Recognition Indian Languages FIRE 2014 Evaluation Track.Google ScholarGoogle Scholar
  23. Guillaume Lample, Miguel Ballesteros, Kazuya Kawakami, Sandeep Subramanian, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260--270.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Chris Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal 6, 2 (2015), 167--195.Google ScholarGoogle Scholar
  25. Qi Li, Haibo Li, Heng Ji, Wen Wang, Jing Zheng, and Fei Huang. 2012. Joint bilingual name tagging for parallel corpora. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1727--1731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Liyuan Liu, Jingbo Shang, Xiang Ren, Frank Fangzheng Xu, Huan Gui, Jian Peng, and Jiawei Han. 2018. Empower sequence labeling with task-aware neural language model. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17123.Google ScholarGoogle Scholar
  27. Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1064--1074.Google ScholarGoogle ScholarCross RefCross Ref
  28. Rudra V. Murthy and Pushpak Bhattacharyya. 2018. A deep learning solution to named entity recognition. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Springer International Publishing, Cham, 427--438.Google ScholarGoogle Scholar
  29. Matthew Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. 2017. Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1756--1765.Google ScholarGoogle ScholarCross RefCross Ref
  30. Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL’09). Association for Computational Linguistics, Stroudsburg, PA, 147--155. Retrieved from http://dl.acm.org/citation.cfm?id=1596374.1596399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Marek Rei. 2017. Semi-supervised multitask learning for sequence labeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2121--2130.Google ScholarGoogle ScholarCross RefCross Ref
  32. M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. Transactions on Signal Processing 45, 11 (Nov. 1997), 2673--2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Anil Kumar Singh. 2008. Named Entity Recognition for South and South East Asian Languages: Taking Stock.Google ScholarGoogle Scholar
  34. Raivis Skadiņš, Jörg Tiedemann, Roberts Rozis, and Daiga Deksne. 2014. Billions of parallel words for free: Building and using the EU bookshop corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland.Google ScholarGoogle Scholar
  35. Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  36. Erik F. Tjong Kim Sang. 2002. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning - Volume 20 (COLING’02). Association for Computational Linguistics, Stroudsburg, PA, 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4 (CONLL’03). Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). Association for Computational Linguistics. 384--394. Retrieved from http://dl.acm.org/citation.cfm?id=1858681.1858721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kãrumũri V. Subbãrão. 2012. South Asian languages: A syntactic typology. Cambridge University Press.Google ScholarGoogle Scholar
  40. L. J. P. van der Maaten and G. E. Hinton. 2008. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605.Google ScholarGoogle Scholar
  41. Mengqiu Wang, Wanxiang Che, and Christopher D. Manning. 2013a. Effective bilingual constraints for semi-supervised learning of named entity recognizers. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 919--925. Retrieved from http://dl.acm.org/citation.cfm?id=2891460.2891588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mengqiu Wang, Wanxiang Che, and Christopher D. Manning. 2013b. Joint word alignment and bilingual named entity recognition using dual decomposition. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Volume 1: Long Papers. Association for Computational Linguistics, 1073--1082. http://www.aclweb.org/anthology/P13-1106.Google ScholarGoogle Scholar
  43. Mengqiu Wang and Christopher D. Manning. 2014. Cross-lingual projected expectation regularization for weakly supervised learning. Transactions of the Association for Computational Linguistics 2 (2014), 55--66. https://transacl.org/ojs/index.php/tacl/article/view/197.Google ScholarGoogle ScholarCross RefCross Ref
  44. Zhilin Yang, Ruslan Salakhutdinov, and William Cohen. 2017. Multi-task cross-lingual sequence tagging from scratch. In International Conference on Learning Representations.Google ScholarGoogle Scholar

Index Terms

  1. Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!