skip to main content
research-article

Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

Authors Info & Claims
Published:18 November 2021Publication History
Skip Abstract Section

Abstract

Hindi is the third most-spoken language in the world (615 million speakers) and has the fourth highest native speakers (341 million). It is an inflectionally rich and relatively free word-order language with an immense vocabulary set. Despite being such a celebrated language across the globe, very few Natural Language Processing (NLP) applications and tools have been developed to support it computationally. Moreover, most of the existing ones are not efficient enough due to the lack of semantic information (or contextual knowledge). Hindi grammar is based on Paninian grammar and derives most of its rules from it. Paninian grammar very aggressively highlights the role of karaka theory in free-word order languages. In this article, we present an application that extracts all possible karakas from simple Hindi sentences with an accuracy of 84.2% and an F1 score of 88.5%. We consider features such as Parts of Speech tags, post-position markers (vibhaktis), semantic tags for nouns and syntactic structure to grab the context in different-sized word windows within a sentence. With the help of these features, we built a rule-based inference engine to extract karakas from a sentence. The application takes in a text file with clean (without punctuation) simple Hindi sentences and gives back karaka tagged sentences in a separate text file as output.

REFERENCES

  1. [1] Everaert Christine. 2010. Tracing the Boundaries between Hindi and Urdu: Lost and Added in Translation between 20th Century Short Stories, Vol. 32, Brill.Google ScholarGoogle Scholar
  2. [2] Bharati Akshar, Chaitanya Vineet, and Sangal Rajiv. 1994. Paninian framework and its application to Anusaraka. Sadhana. 19, 1 (1994), 113127.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chaitanya Vineet, Sangal Rajiv, and Bharati Akshar. 1996. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India.Google ScholarGoogle Scholar
  4. [4] Kumar Rajesh, Raj Kiran, and Yadav Abhinav. 2013. PoS tagging and CYK Parsing for Indian Languages. Retrieved October 18, 2021 from https://github.com/rajesh-iiith/POS-Tagging-and-CYK-Parsing-for-Indian-Languages.Google ScholarGoogle Scholar
  5. [5] Sathyarthe Kamal, Prakash Gupt Ravi, and Prakash Dipti. 2012. Manak Hindi Vyakaran Evam Rachana—Class 9 and 10 (Course-A). New Saraswati House India Pvt. Ltd., New Delhi.Google ScholarGoogle Scholar
  6. [6] Kataria Aanchal and Nath Rajender. 2015. Natural language interface for databases in Hindi based on karaka theory. International Journal of Computer Applications 122, 7 (2015), 39–43.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bharati Akshar and Sangal Rajiv. 1990. A karaka-based approach to parsing of Indian languages. In Proceedings of the 13th International Conference on Computational Linguistics. ACL, Helsinki (Finland) (COLNG'90). Vol. 3, 25–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Pedersen Mark, Eades Domenyk, Amin Samir K., and Prakash Lakshmi. 2004. Relative clauses in Hindi and Arabic: A Paninian dependency grammar analysis. In Proceedings of the Workshop on Recent Advances in Dependency Grammar. ACL, Geneva (Switzerland), 9–16.Google ScholarGoogle Scholar
  9. [9] Ambati Bharat Ram, Husain Samar, Nivre Joakim, and Sangal Rajeev. 2010. On the role of morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. ACL, Los Angeles, 94–102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Anuranjana Kaveri, Rao Vijjini Anvesh, and Mamidi Radhika. 2019. Hindi question generation using dependency structures. arXiv preprint arXiv:1906.08570.Google ScholarGoogle Scholar
  11. [11] Nomani Maaz A. and Sharma Dipti Misra. 2016. Towards building semantic role labeler for Indian languages. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16). ELRA, Portoroz, 4588–4595.Google ScholarGoogle Scholar
  12. [12] Gupta Mridul, Yadav Vineet, Husain Samar, and Sharma Dipti M.. 2008. A rule based approach for automatic annotation of a Hindi treebank. In Proeedings of the 6th International Conference on Natural Language Processing (ICON.08), NLPAI, CDAC Pune (India). 1–10.Google ScholarGoogle Scholar
  13. [13] Singh Muskaan, Kumar Ravinder, and Chana Inderveer. 2020. Improving neural machine translation for low-resource Indian languages using rule-based feature extraction. In Neural Computing & Applications, Vol. 33. Springer, 1103–1122.Google ScholarGoogle Scholar
  14. [14] Bhatt Rajesh, Narasimhan Bhuvana, Palmer Martha, Rambow Owen, Sharma Dipti Misra, and Xia Fei. 2009. A multi-representational and multi-layered treebank for Hindi/Urdu. In Proceedings of the Third Linguistic Annotation Workshop (LAW III). ACL, Suntec (Singapore), 186–189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Pal Riya and Sharma Dipti Misra. 2019. Towards automated semantic role labelling of hindi-english code-mixed tweets. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT'19). ACL, Hong Kong, 291–296.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Singh Satyendra and Siddiqui Tanveer J.. 2015. Role of karaka relations in Hindi word sense disambiguation. Journal of Information Technology Research 8(3), 2142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Mishra Sudhir K. and Jha Girish Nath. 2007. Sanskrit karaka analyzer for machine translation. SPLASH Proceedings of iSTRANS. 224225.Google ScholarGoogle Scholar
  18. [18] Mishra Sudhir K.. 2017. Karaka analysis of complicated Sanskrit sentences. Vagarthah: An International Journal of Sanskrit Research I(II). 47.Google ScholarGoogle Scholar
  19. [19] Jha Girish Nath and Mishra Sudhir K.. 2007. Semantic processing in pāṇini's kāraka system. In Sanskrit Computational Linguistics. Springer, Berlin, 239252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Palmer Martha, Bhatt Rajesh, Narasimhan Bhuvana, Rambow Owen, Sharma Dipti Misra, and Xia Fei. 2009. Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure. In 7th International Conference on Natural Language Processing. NLPAI, Hyderabad (India), 14–17.Google ScholarGoogle Scholar
  21. [21] Jurafsky Daniel and Martin James H.. 2019. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. 3rd Edition Draft. Prentice Hall, New Jersey. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 2
      March 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494070
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 November 2021
      • Accepted: 1 August 2021
      • Revised: 1 July 2021
      • Received: 1 December 2020
      Published in tallip Volume 21, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)53
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!