skip to main content
research-article

Reduplication in Assamese: Identification and Modeling

Published:17 May 2022Publication History
Skip Abstract Section

Abstract

Reduplication is a productive morphological process widely used in a substantial number of languages in the world. Reduplication is a well-studied phenomenon, and several typological works have provided evidence for different types of reduplication in most of the languages around the world. Addressing reduplication plays a vital role in the efficiency of POS tagger, sentiment analysis, as well as other NLP tasks. However, it is an understudied area in computational linguistics, especially in low-resource languages like Assamese. This article first describes different types of reduplication and their shapes that occur in Assamese. Second, an exhaustive set of reduplication formation rules is compiled that is incorporated to build a system to identify reduplication in Assamese text. The results of the experiments performed on three different domain datasets showed that the rule-based system can identify reduplicated expressions with an average precision, recall, and F1 scores of 94.19%, 98.07%, and 96.07%, respectively. Third, it is shown that the Assamese reduplication processes can be captured through a two-way finite-state transducer (2-way FST). Finally, two broad categories of reduplicative processes along with their corresponding 2-way FST model are presented.

REFERENCES

  1. Abbi Anvita. 1990. Reduplication in Tibeto Burman languages of south Asia. Japan. J. South. Asian Stud. 28, 2 (1990), 171181.Google ScholarGoogle Scholar
  2. Abbi Anvita. 1992. Reduplication in South Asian Languages: An Areal, Typological, and Historical Study. Allied Publishers Pvt. Ltd, India.Google ScholarGoogle Scholar
  3. Bauer Laurie. 1988. Introducing Linguistic Morphology, Vol. 57. Edinburgh University Press Edinburgh.Google ScholarGoogle Scholar
  4. Beesley Kenneth R. and Karttunen Lauri. 2003. Finite-state morphology: Xerox tools and techniques. CSLI, Stanford (2003).Google ScholarGoogle Scholar
  5. Census. 2020. Abstract of Speakers’ Strength of Languages and Mother Tongues - 2011. Retrieved from http://censusindia.gov.in/2011Census/C-16_25062018_NEW.pdf.Google ScholarGoogle Scholar
  6. Chakraborty Tanmoy and Bandyopadhyay Sivaji. 2010. Identification of reduplication in Bengali corpus and their semantic analysis: A rule based approach. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications. 7376.Google ScholarGoogle Scholar
  7. Chandlee Jane and Heinz Jeffrey. 2018. Strict locality and phonological maps. Ling. Inq. 49, 1 (2018), 2360.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cohen-Sygal Yael and Wintner Shuly. 2006. Finite-state registered automata for non-concatenative morphology. Comput. Ling. 32, 1 (2006), 4982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Culy Christopher. 1985. The complexity of the vocabulary of Bambara. In The Formal Complexity of Natural Language. Springer, 349357.Google ScholarGoogle Scholar
  10. Dattamajumdar Satarupa. 1999. A Contrastive Study of the Reduplicated Structures in Asamiya Bangla and Odia. Ph.D. Dissertation. Department of Linguistics, University of Calcutta, Kolkata, West Bengal.Google ScholarGoogle Scholar
  11. Dolatian Hossep and Heinz Jeffrey. 2017. Reduplication with finite-state technology. Proc. CLS 53 (2017), 55– 69.Google ScholarGoogle Scholar
  12. Dolatian Hossep and Heinz Jeffrey. 2018. Modeling reduplication with 2-way finite-state transducers. In Proceedings of the 15th Workshop on Computational Research in Phonetics, Phonology, and Morphology. 6677.Google ScholarGoogle ScholarCross RefCross Ref
  13. Dolatian Hossep and Heinz Jeffrey. 2019. RedTyp: A database of reduplication with computational models. Proc. Soc. Comput. Ling. 2, 1 (2019), 818.Google ScholarGoogle Scholar
  14. Engelfriet Joost and Hoogeboom Hendrik Jan. 2001. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Logic 2, 2 (2001), 216254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Filiot Emmanuel and Reynier Pierre-Alain. 2016. Transducers, logic and algebra for functions of finite words. ACM SIGLOG News 3, 3 (2016), 419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Goswami G. C.. 1982. Structure of Assamese. Guwahati University, Guwahati.Google ScholarGoogle Scholar
  17. Goswami G. C.. 1987. Fundamentals of Assamese Grammar (, 11th Edition) (Reprint, 2017). Bina Library, Panbazar, Guwahati.Google ScholarGoogle Scholar
  18. Goswami U.. 1978. An Introduction to Assamese. Mani Manik Prakash, Panbazar, Guwahati.Google ScholarGoogle Scholar
  19. Goswami U.. 1981. Asamiya Bhashar Vyakarana, (10th ed). 2011. Mani Manik Prakash, Panbazar, Guwahati.Google ScholarGoogle Scholar
  20. Hopcroft John E. and Ullman Jeffrey D.. 1969. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc.Google ScholarGoogle Scholar
  21. Hulden Mans and Bischoff Shannon T.. 2009. A simple formalism for capturing reduplication in finite-state morphology. In Proceedings of the Conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop. 207214.Google ScholarGoogle Scholar
  22. Hurch Bernhard and Mattes Veronika. 2005. Studies on Reduplication. Number 28. Walter de Gruyter.Google ScholarGoogle ScholarCross RefCross Ref
  23. Inkelas Sharon and Zoll Cheryl. 2005. Reduplication: Doubling in Morphology, Vol. 106. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  24. Kunchukuttan Anoop. 2019.Indic NLP Library. Retrieved from https://github.com/anoopkunchukuttan/indic_nlp_resources.git.Google ScholarGoogle Scholar
  25. Levenshtein Vladimir I.. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Sov. Phys. Doklady, Vol. 10. 707710.Google ScholarGoogle Scholar
  26. McCarthy John J. and Prince Alan S.. 1995. Faithfulness and reduplicative identity. Linguistics Department Faculty Publication Series (1995), 10.Google ScholarGoogle Scholar
  27. Nongmeikapam Kishorjit and Bandyopadhyay Sivaji. 2011. Identification of reduplicated MWEs in Manipuri: A rule based approach. In Proceedings of 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL’10). 49–54.Google ScholarGoogle Scholar
  28. Pathak D.. 2020. Assamese reduplication identification system. Retrieved from https://github.com/anononymus/assamese-redup.Google ScholarGoogle Scholar
  29. Roark Brian and Sproat Richard William. 2007. Computational Approaches to Morphology and Syntax, Vol. 4. Oxford University Press.Google ScholarGoogle Scholar
  30. Rubino Carl. 2005. Reduplication: Form, function and distribution. Studies on Reduplication 28 (2005), 11–29.Google ScholarGoogle Scholar
  31. Rubino Carl. 2013. Reduplication. In The World Atlas of Language Structures Online, Dryer Matthew S. and Haspelmath Martin (Eds.). Max Planck Institute for Evolutionary Anthropology, Leipzig. Retrieved from https://wals.info/chapter/27.Google ScholarGoogle Scholar
  32. Saikia Bora L.. 2016. Assamese Grammar and Usage: An Analytical Studies of Assamese Grammar and Usage. Chandra Prakash, Guwahati, Panbazar, Guwahati.Google ScholarGoogle Scholar
  33. Singh Rajendra. 2005. Reduplication in modern Hindi and the theory of reduplication. Stud. Redup.28 (2005), 263.Google ScholarGoogle Scholar

Index Terms

  1. Reduplication in Assamese: Identification and Modeling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 5
      September 2022
      486 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3533669
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 May 2022
      • Online AM: 3 February 2022
      • Accepted: 1 January 2022
      • Revised: 1 December 2021
      • Received: 1 November 2020
      Published in tallip Volume 21, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!