Abstract
Building computational resources and tools for the under-resourced languages is strenuous for any Natural Language Processing task. This article presents the first dependency parser for an under-resourced Indian language, Nepali. A prerequisite for developing a parser for a language is a corpus annotated with the desired linguistic representations known as a treebank. With an aim of cross-lingual learning and typological research, we use a Bengali treebank to build a Bengali-Nepali parallel corpus and apply the method of annotation projection from the Bengali treebank to build a treebank for Nepali. With the developed treebank, MaltParser (with all algorithms for projective dependency structures) and a Neural network-based parser have been used to build Nepali parser models. The Neural network-based parser produced state-of-the-art results with 81.2 Unlabeled Attachment Score, 73.2 Label Accuracy, and 66.1 Labeled Attachment Score on the gold test data. The parser models have also been evaluated with the predicted Part-of-speech (POS)-tagged test data. A statistical POS tagger using Conditional Random Field has been developed for predicting the POS tags of the test data.
Supplemental Material
Available for Download
Supplementary material
- [1] . 1991. A Descriptive Grammar of Nepali and an Analyzed Corpus. Georgetown University Press.Google Scholar
- [2] . 2012. Slovene-croatian treebank transfer using bilingual lexicon improves croatian dependency parsing. Retrieved from http://bib.irb.hr/datoteka/597440.zadmdb_islt_2012_final.pdf.Google Scholar
- [3] . 2014. Cross-lingual dependency parsing of related languages with rich morphosyntactic tagsets. In Proceedings of the Workshop on Language Technology for Closely Related Languages and Language Variants (EMNLP’14). Association for Computational Linguistics, 13–24. Google Scholar
Cross Ref
- [4] . 2004. Structure of Nepali grammar. Madan Puraskar Pustakalaya.Google Scholar
- [5] . 2017. Developing a Pilot Hindi Treebank Based on Computational Paninian Grammar. PhD Thesis International Institute of Information Technology Hyderabad.Google Scholar
- [6] . 2008. Dependency annotation scheme for Indian languages. In Proceedings of the International Joint Conference on natural Language Processing (IJCNLP’08).Google Scholar
- [7] . 2012. Croatian dependency treebank: Recent development and initial experiments. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’12).Google Scholar
- [8] . 2008. Supporting research environment for less explored languages: A case study of Swedish and Turkish. Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein (2008), 96–110.Google Scholar
- [9] . 1995. Natural Language Processesing: A Paninian Perspective. Prentice-Hall of India, New Delhi. 65–106.Google Scholar
- [10] . 2009. Constraint based hybrid approach to parsing Indian languages. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Vol. 2, City University of Hong Kong, Hong Kong, 614–621. https://aclanthology.org/Y09-2020.Google Scholar
- [11] . 1993. Parsing free word order languages in the Paninian framework. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics (ACL’93). Association for Computational Linguistics, 105–111. Google Scholar
Digital Library
- [12] . 2002. AnnCorra: Building tree-banks in Indian languages. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (COLING’02). Retrieved from https://www.aclweb.org/anthology/W02-1202.Google Scholar
Digital Library
- [13] . 2002. A constraint based parser using integer programming. In Proceedings of the International Conference on Natural Language Processing (ICON’02).Google Scholar
- [14] . 2009. AnnCorra: TreeBanks for Indian languages, guidelines for annotating Hindi TreeBank (version 2.0). Retrieved from http://ltrc.iiit.ac.in/MachineTrans/research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf.Google Scholar
- [15] . 2006. Complex predicates in Indian languages and wordnets. Lang. Res. Eval. 40, 3/4 (2006), 331–355. Retrieved from http://www.jstor.org/stable/30208386.Google Scholar
- [16] . 1986. Word grammar: Richard Hudson, Basil Blackwell, Oxford, 1984. Lingua 69, 3 (1986), 283–287. Google Scholar
Cross Ref
- [17] . 2010. “Uchchatara Bangla Vyakaran”: A Complete Text Book on Higher Bengali Grammar. Akshay Malancha.Google Scholar
- [18] . 2014. A dependency annotation scheme for Bangla treebank. Lang. Resour. Eval. 48, 3 (
Sept. 2014), 443–477. Google ScholarDigital Library
- [19] . 2003. Bhasha-prakash Bangala Vyakaran: A Grammar of the Bangla Language. Roopa and Company.Google Scholar
- [20] . 2013. Divergences in English-Hindi parallel dependency treebanks. In Proceedings of the 2nd International Conference on Dependency Linguistics (DepLing’13). Charles University in Prague, Matfyzpress, Prague, Czech Republic, 33–40.Google Scholar
- [21] . 2000. Itrans “Indian Language Transliteration Package”: A Package for Printing Text in Indian Language Scripts. Retrieved from http://www.aczone.com/itrans/.Google Scholar
- [22] . 1965. On the shortest arborescence of a directed graph. Sci. Sinica 14 (1965), 1396–1400. Retrieved from https://ci.nii.ac.jp/naid/10030090917/en/.Google Scholar
- [23] . 2003. FuSe—A multi-layered parallel treebank. In Proceedings of 2nd Workshop on Treebanks and Linguistic Theories. 213–216.Google Scholar
- [24] . 2009. Structure simplification and demand satisfaction approach to dependency parsing in Bangla. In Proceedings of ICON’09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India, 25–31.Google Scholar
- [25] . 2012. A hybrid dependency parser for Bangla. In Proceedings of the 10th Workshop on Asian Language Resources. 55–64.Google Scholar
- [26] . 2016. Deep biaffine attention for neural dependency parsing. Retrieved from https://arXiv:1611.01734.Google Scholar
- [27] . 2002. The IJS-ELAN Slovene-English parallel corpus. Int. J. Corpus Linguist. 7, 1 (2002), 1–20.Google Scholar
Cross Ref
- [28] . 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378–382.Google Scholar
Cross Ref
- [29] . 2009. Dependency parser for Bengali: The JU system. In Proceedings of the International Conference on Natural Language Processing (ICON’09).Google Scholar
- [30] . 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18).Google Scholar
- [31] . 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- [32] . 2001. Prague dependency treebank 1.0 LDC catalog number LDC2001T10. In Proceedings of the Philadelphia Linguistic Data Consortium. Google Scholar
- [33] . 1994. Noun + gar-nu expressions in Nepali. In Aspects of Nepali Grammar, (Ed.). Dept. of Linguistics, University of California, Santa Barbara, CA, Chapter 4, 116–132.Google Scholar
- [34] . 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [35] . 2012. Intra-chunk dependency annotation: Expanding Hindi inter-chunk annotated treebank. In Proceedings of the 6th Linguistic Annotation Workshop (LAW VI’12). Association for Computational Linguistics, 49–56.Google Scholar
- [36] . 2020. Ancient text recognition: A review. Artific. Intell. Rev. 53 (Dec. 2020). Google Scholar
Digital Library
- [37] . 2012. Hindi dependency parsing using a combined model of Malt and MST. 171–178.Google Scholar
- [38] . 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers, San Francisco, CA, 282–289. Google Scholar
Digital Library
- [39] . 1998. A Course in Nepali. RatnaPustak Bhandar, Nepal. 65–106 pages.Google Scholar
- [40] . 1988. Dependency Syntax: Theory and Practice. State University of New York Press.Google Scholar
- [41] . 2021. PCA-based gender classification system using hybridization of features and classification techniques. Soft Comput. 25 (
12 2021). Google ScholarDigital Library
- [42] . 2021. DeepNetDevanagari: A deep learning model for Devanagari ancient character recognition. Multimedia Tools Appl. 80 (May 2021). Google Scholar
Digital Library
- [43] . 2018. Uchcha Madyamik Nepali Vyakaran Ra Rachana. Ekta Book House, Pradhan Nagar, Siliguri.Google Scholar
- [44] . 2016. Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 1659–1666. Retrieved from https://aclanthology.org/L16-1262.Google Scholar
- [45] . 2020. Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 4034–4043. Retrieved from https://aclanthology.org/2020.lrec-1.497.Google Scholar
- [46] . 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Lang. Eng. 13, 2 (2007), 95–135. Google Scholar
Cross Ref
- [47] . 2022. Performance evaluation of different features and classifiers for Gurumukhi newspaper text recognition. J. Ambient Intell. Human. Comput. (Jan. 2022), 3. Google Scholar
Cross Ref
- [48] . 2009. A multi-representational and multi-layered treebank for hindi/urdu. In Proceedings of the 3rd Linguistic Annotation Workshop (ACL-IJCNLP’09). Association for Computational Linguistics, 186–189.Google Scholar
- [49] . 1995. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi.Google Scholar
- [50] . 2022. Computational intelligence in processing of speech acoustics: A survey. Complex Intell. Syst. (
Feb. 2022). Google ScholarCross Ref
- [51] . 2016. Synthetic treebanking for cross-lingual dependency parsing. J. Artif. Int. Res. 55, 1 (
Jan. 2016), 209–248. Google ScholarDigital Library
- [52] . 2008. Construction and annotation of a corpus of contemporary Nepali. Corpora 3 (Nov. 2008), 213–225. Google Scholar
Cross Ref
Index Terms
Annotation Projection-based Dependency Parser Development for Nepali
Recommendations
Dependency Parser for Telugu Language
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive StrategiesIn Telugu language sentence if we change the word order its meaning was not changed whereas in English if we change the word order the meaning was changed. So Telugu is morphologically rich so it is very difficult to develop syntactic parsers for these ...
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation
When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
Dependency Parser Based Textual Entailment System
AICI '10: Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence - Volume 01The development of a parser based textual entailment system that is based on comparing the dependency relations in both the text and the hypothesis has been reported. The textual entailment system uses the CCG Parser and the Stanford Parser. The ...






Comments