skip to main content
research-article

Annotation Projection-based Dependency Parser Development for Nepali

Authors Info & Claims
Published:27 December 2022Publication History
Skip Abstract Section

Abstract

Building computational resources and tools for the under-resourced languages is strenuous for any Natural Language Processing task. This article presents the first dependency parser for an under-resourced Indian language, Nepali. A prerequisite for developing a parser for a language is a corpus annotated with the desired linguistic representations known as a treebank. With an aim of cross-lingual learning and typological research, we use a Bengali treebank to build a Bengali-Nepali parallel corpus and apply the method of annotation projection from the Bengali treebank to build a treebank for Nepali. With the developed treebank, MaltParser (with all algorithms for projective dependency structures) and a Neural network-based parser have been used to build Nepali parser models. The Neural network-based parser produced state-of-the-art results with 81.2 Unlabeled Attachment Score, 73.2 Label Accuracy, and 66.1 Labeled Attachment Score on the gold test data. The parser models have also been evaluated with the predicted Part-of-speech (POS)-tagged test data. A statistical POS tagger using Conditional Random Field has been developed for predicting the POS tags of the test data.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Acharya Jayaraj. 1991. A Descriptive Grammar of Nepali and an Analyzed Corpus. Georgetown University Press.Google ScholarGoogle Scholar
  2. [2] Agi Željko and Merkler Danijela. 2012. Slovene-croatian treebank transfer using bilingual lexicon improves croatian dependency parsing. Retrieved from http://bib.irb.hr/datoteka/597440.zadmdb_islt_2012_final.pdf.Google ScholarGoogle Scholar
  3. [3] Agić Željko, Tiedemann Jörg, Merkler Danijela, Krek Simon, Dobrovoljc Kaja, and Može Sara. 2014. Cross-lingual dependency parsing of related languages with rich morphosyntactic tagsets. In Proceedings of the Workshop on Language Technology for Closely Related Languages and Language Variants (EMNLP’14). Association for Computational Linguistics, 1324. Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bal Bal Krishna. 2004. Structure of Nepali grammar. Madan Puraskar Pustakalaya.Google ScholarGoogle Scholar
  5. [5] Begum Rafia. 2017. Developing a Pilot Hindi Treebank Based on Computational Paninian Grammar. PhD Thesis International Institute of Information Technology Hyderabad.Google ScholarGoogle Scholar
  6. [6] Begum Rafiya, Husain Samar, Dhwaj Arun, Sharma Dipti Misra, Bai Lakshmi, and Sangal Rajeev. 2008. Dependency annotation scheme for Indian languages. In Proceedings of the International Joint Conference on natural Language Processing (IJCNLP’08).Google ScholarGoogle Scholar
  7. [7] Berovic Dasa, Agic Zeljko, and Tadić Marko. 2012. Croatian dependency treebank: Recent development and initial experiments. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’12).Google ScholarGoogle Scholar
  8. [8] Beáta Megyesi, Bengt Dahlqvist, Eva Pettersson, Sofia Gustafson-Capková, and Nivre Joakim. 2008. Supporting research environment for less explored languages: A case study of Swedish and Turkish. Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein (2008), 96110.Google ScholarGoogle Scholar
  9. [9] Bharati Akshar, Chaitanya Vineet, and Sangal Rajeev. 1995. Natural Language Processesing: A Paninian Perspective. Prentice-Hall of India, New Delhi. 65–106.Google ScholarGoogle Scholar
  10. [10] Bharati Akshar, Husain Samar, Vijay Meher, Deepak Kalyan, Sharma Dipti Misra, and Sangal Rajeev. 2009. Constraint based hybrid approach to parsing Indian languages. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Vol. 2, City University of Hong Kong, Hong Kong, 614–621. https://aclanthology.org/Y09-2020.Google ScholarGoogle Scholar
  11. [11] Bharati Akshar and Sangal Rajeev. 1993. Parsing free word order languages in the Paninian framework. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics (ACL’93). Association for Computational Linguistics, 105–111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Bharati Akshar, Sangal Rajeev, Chaitanya Vineet, Kulkarni Amba, Sharma Dipti Misra, and Ramakrishnamacharyulu K. V.. 2002. AnnCorra: Building tree-banks in Indian languages. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (COLING’02). Retrieved from https://www.aclweb.org/anthology/W02-1202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Bharati Akshar, Sangal Rajeev, and Reddy T. Papi. 2002. A constraint based parser using integer programming. In Proceedings of the International Conference on Natural Language Processing (ICON’02).Google ScholarGoogle Scholar
  14. [14] Bharati Akshar., Sharma Dipti Mishra, Husain Samar, Bai Lakshmi, Begam Rafia, and Sangal Rajeev. 2009. AnnCorra: TreeBanks for Indian languages, guidelines for annotating Hindi TreeBank (version 2.0). Retrieved from http://ltrc.iiit.ac.in/MachineTrans/research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf.Google ScholarGoogle Scholar
  15. [15] Bhattacharyya Pushpak, Chakrabarti Debasri, and Sarma Vaijayanthi M.. 2006. Complex predicates in Indian languages and wordnets. Lang. Res. Eval. 40, 3/4 (2006), 331355. Retrieved from http://www.jstor.org/stable/30208386.Google ScholarGoogle Scholar
  16. [16] Borsley Robert D.. 1986. Word grammar: Richard Hudson, Basil Blackwell, Oxford, 1984. Lingua 69, 3 (1986), 283287. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Chakravarty Bamandev. 2010. “Uchchatara Bangla Vyakaran”: A Complete Text Book on Higher Bengali Grammar. Akshay Malancha.Google ScholarGoogle Scholar
  18. [18] Chatterji Sanjay, Sarkar Tanaya Mukherjee, Dhang Pragati, Deb Samhita, Sarkar Sudeshna, Chakraborty Jayshree, and Basu Anupam. 2014. A dependency annotation scheme for Bangla treebank. Lang. Resour. Eval. 48, 3 (Sept.2014), 443477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Chatterji Suniti Kumar. 2003. Bhasha-prakash Bangala Vyakaran: A Grammar of the Bangla Language. Roopa and Company.Google ScholarGoogle Scholar
  20. [20] Chaudhry Himani, Sharma Himanshu, and Sharma Dipti Misra. 2013. Divergences in English-Hindi parallel dependency treebanks. In Proceedings of the 2nd International Conference on Dependency Linguistics (DepLing’13). Charles University in Prague, Matfyzpress, Prague, Czech Republic, 3340.Google ScholarGoogle Scholar
  21. [21] Chopde Avinash. 2000. Itrans “Indian Language Transliteration Package”: A Package for Printing Text in Indian Language Scripts. Retrieved from http://www.aczone.com/itrans/.Google ScholarGoogle Scholar
  22. [22] Chu Y.. 1965. On the shortest arborescence of a directed graph. Sci. Sinica 14 (1965), 13961400. Retrieved from https://ci.nii.ac.jp/naid/10030090917/en/.Google ScholarGoogle Scholar
  23. [23] Cyrus Lea, Feddes Hendrik, and Schumacher Frank. 2003. FuSe—A multi-layered parallel treebank. In Proceedings of 2nd Workshop on Treebanks and Linguistic Theories. 213216.Google ScholarGoogle Scholar
  24. [24] De Sankar, Dhar Arnab, and Garain Utpal. 2009. Structure simplification and demand satisfaction approach to dependency parsing in Bangla. In Proceedings of ICON’09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India, 2531.Google ScholarGoogle Scholar
  25. [25] Dhar Arnab, Chatterji Sanjay, Sarkar Sudeshna, and Basu Anupam. 2012. A hybrid dependency parser for Bangla. In Proceedings of the 10th Workshop on Asian Language Resources. 5564.Google ScholarGoogle Scholar
  26. [26] Dozat Timothy and Manning Christopher D.. 2016. Deep biaffine attention for neural dependency parsing. Retrieved from https://arXiv:1611.01734.Google ScholarGoogle Scholar
  27. [27] Erjavec Tomaž. 2002. The IJS-ELAN Slovene-English parallel corpus. Int. J. Corpus Linguist. 7, 1 (2002), 120.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Fleiss Joseph. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378382.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ghosh Aniruddha, Bhaskar Pinaki, Das Amitava, and Bandyopadhyay Sivaji. 2009. Dependency parser for Bengali: The JU system. In Proceedings of the International Conference on Natural Language Processing (ICON’09).Google ScholarGoogle Scholar
  30. [30] Grave Edouard, Bojanowski Piotr, Gupta Prakhar, Joulin Armand, and Mikolov Tomas. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18).Google ScholarGoogle Scholar
  31. [31] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, and Sgall Petr. 2001. Prague dependency treebank 1.0 LDC catalog number LDC2001T10. In Proceedings of the Philadelphia Linguistic Data Consortium. Google ScholarGoogle Scholar
  33. [33] Kibre Nicholas. 1994. Noun + gar-nu expressions in Nepali. In Aspects of Nepali Grammar, Genetti Carol (Ed.). Dept. of Linguistics, University of California, Santa Barbara, CA, Chapter 4, 116132.Google ScholarGoogle Scholar
  34. [34] Kingma Diederik and Ba Jimmy. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  35. [35] Kosaraju Prudhvi, Husain Samar, Ambati Bharat Ram, Sharma Dipti Misra, and Sangal Rajeev. 2012. Intra-chunk dependency annotation: Expanding Hindi inter-chunk annotated treebank. In Proceedings of the 6th Linguistic Annotation Workshop (LAW VI’12). Association for Computational Linguistics, 4956.Google ScholarGoogle Scholar
  36. [36] Kumar Munish, Jindal M. K., and Narang Sonika. 2020. Ancient text recognition: A review. Artific. Intell. Rev. 53 (Dec. 2020). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Kumari Venkata and Rao Ramisetty Rajeswara. 2012. Hindi dependency parsing using a combined model of Malt and MST. 171178.Google ScholarGoogle Scholar
  38. [38] Lafferty John D., McCallum Andrew, and Pereira Fernando C. N.. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers, San Francisco, CA, 282289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Mathew David. 1998. A Course in Nepali. RatnaPustak Bhandar, Nepal. 65–106 pages.Google ScholarGoogle Scholar
  40. [40] Mel’čuk Igor. 1988. Dependency Syntax: Theory and Practice. State University of New York Press.Google ScholarGoogle Scholar
  41. [41] Nagpal Shaveta, Kumar Munish, and Tuteja Shikha. 2021. PCA-based gender classification system using hybridization of features and classification techniques. Soft Comput. 25 (122021). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Narang Sonika Rani, Kumar Munish, and Jindal M. K.. 2021. DeepNetDevanagari: A deep learning model for Devanagari ancient character recognition. Multimedia Tools Appl. 80 (May 2021). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Nepal Ghanshyam and Lama Kavita. 2018. Uchcha Madyamik Nepali Vyakaran Ra Rachana. Ekta Book House, Pradhan Nagar, Siliguri.Google ScholarGoogle Scholar
  44. [44] Nivre Joakim, Marneffe Marie-Catherine de, Ginter Filip, Goldberg Yoav, Hajič Jan, Manning Christopher D., McDonald Ryan, Petrov Slav, Pyysalo Sampo, Silveira Natalia, Tsarfaty Reut, and Zeman Daniel. 2016. Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 16591666. Retrieved from https://aclanthology.org/L16-1262.Google ScholarGoogle Scholar
  45. [45] Nivre Joakim, Marneffe Marie-Catherine de, Ginter Filip, Hajič Jan, Manning Christopher D., Pyysalo Sampo, Schuster Sebastian, Tyers Francis, and Zeman Daniel. 2020. Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 40344043. Retrieved from https://aclanthology.org/2020.lrec-1.497.Google ScholarGoogle Scholar
  46. [46] Nivre Joakim, Hall Johan, Nilsson Jens, Chanev Atanas, Eryigit Gülsen, Kübler Sandra, and Marsi Erwin Marinov, Svetoslav. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Lang. Eng. 13, 2 (2007), 95135. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Kaur Rupinder Pal, Kumar Munish, and Jindal M. K.. 2022. Performance evaluation of different features and classifiers for Gurumukhi newspaper text recognition. J. Ambient Intell. Human. Comput. (Jan. 2022), 3. Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. A multi-representational and multi-layered treebank for hindi/urdu. In Proceedings of the 3rd Linguistic Annotation Workshop (ACL-IJCNLP’09). Association for Computational Linguistics, 186189.Google ScholarGoogle Scholar
  49. [49] Sangal Rajeev, Chaitanya Vineet, and Bharati Akshar. 1995. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi.Google ScholarGoogle Scholar
  50. [50] Singh Amitoj, Kaur Navkiran, Kukreja Vinay, Kadyan Virender, and Kumar Munish. 2022. Computational intelligence in processing of speech acoustics: A survey. Complex Intell. Syst. (Feb.2022). Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Tiedemann Jörg and Agic Željko. 2016. Synthetic treebanking for cross-lingual dependency parsing. J. Artif. Int. Res. 55, 1 (Jan.2016), 209248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Yadava Yogendra, Hardie Andrew, Lohani Ram, Regmi Bhim, Gurung Srishtee, Gurung Amar, Mcenery Tony, Allwood Jens, and Hall Pat. 2008. Construction and annotation of a corpus of contemporary Nepali. Corpora 3 (Nov. 2008), 213225. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Annotation Projection-based Dependency Parser Development for Nepali

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
      February 2023
      624 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3572719
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 December 2022
      • Online AM: 11 June 2022
      • Accepted: 29 May 2022
      • Revised: 12 April 2022
      • Received: 5 June 2021
      Published in tallip Volume 22, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)112
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!