Abstract
Reduplication is a productive morphological process widely used in a substantial number of languages in the world. Reduplication is a well-studied phenomenon, and several typological works have provided evidence for different types of reduplication in most of the languages around the world. Addressing reduplication plays a vital role in the efficiency of POS tagger, sentiment analysis, as well as other NLP tasks. However, it is an understudied area in computational linguistics, especially in low-resource languages like Assamese. This article first describes different types of reduplication and their shapes that occur in Assamese. Second, an exhaustive set of reduplication formation rules is compiled that is incorporated to build a system to identify reduplication in Assamese text. The results of the experiments performed on three different domain datasets showed that the rule-based system can identify reduplicated expressions with an average precision, recall, and F1 scores of 94.19%, 98.07%, and 96.07%, respectively. Third, it is shown that the Assamese reduplication processes can be captured through a two-way finite-state transducer (2-way FST). Finally, two broad categories of reduplicative processes along with their corresponding 2-way FST model are presented.
- . 1990. Reduplication in Tibeto Burman languages of south Asia. Japan. J. South. Asian Stud. 28, 2 (1990), 171–181.Google Scholar
- . 1992. Reduplication in South Asian Languages: An Areal, Typological, and Historical Study. Allied Publishers Pvt. Ltd, India.Google Scholar
- . 1988. Introducing Linguistic Morphology, Vol. 57. Edinburgh University Press Edinburgh.Google Scholar
- . 2003. Finite-state morphology: Xerox tools and techniques. CSLI, Stanford (2003).Google Scholar
- . 2020. Abstract of Speakers’ Strength of Languages and Mother Tongues - 2011. Retrieved from http://censusindia.gov.in/2011Census/C-16_25062018_NEW.pdf.Google Scholar
- . 2010. Identification of reduplication in Bengali corpus and their semantic analysis: A rule based approach. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications. 73–76.Google Scholar
- . 2018. Strict locality and phonological maps. Ling. Inq. 49, 1 (2018), 23–60.Google Scholar
Cross Ref
- . 2006. Finite-state registered automata for non-concatenative morphology. Comput. Ling. 32, 1 (2006), 49–82.Google Scholar
Digital Library
- . 1985. The complexity of the vocabulary of Bambara. In The Formal Complexity of Natural Language. Springer, 349–357.Google Scholar
- . 1999. A Contrastive Study of the Reduplicated Structures in Asamiya Bangla and Odia. Ph.D. Dissertation. Department of Linguistics, University of Calcutta, Kolkata, West Bengal.Google Scholar
- . 2017. Reduplication with finite-state technology. Proc. CLS 53 (2017), 55– 69.Google Scholar
- . 2018. Modeling reduplication with 2-way finite-state transducers. In Proceedings of the 15th Workshop on Computational Research in Phonetics, Phonology, and Morphology. 66–77.Google Scholar
Cross Ref
- . 2019. RedTyp: A database of reduplication with computational models. Proc. Soc. Comput. Ling. 2, 1 (2019), 8–18.Google Scholar
- . 2001. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Logic 2, 2 (2001), 216–254.Google Scholar
Digital Library
- . 2016. Transducers, logic and algebra for functions of finite words. ACM SIGLOG News 3, 3 (2016), 4–19.Google Scholar
Digital Library
- . 1982. Structure of Assamese. Guwahati University, Guwahati.Google Scholar
- . 1987. Fundamentals of Assamese Grammar (
, 11th Edition) (Reprint, 2017). Bina Library, Panbazar, Guwahati.Google Scholar - . 1978. An Introduction to Assamese. Mani Manik Prakash, Panbazar, Guwahati.Google Scholar
- . 1981. Asamiya Bhashar Vyakarana, (10th ed). 2011. Mani Manik Prakash, Panbazar, Guwahati.Google Scholar
- . 1969. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc.Google Scholar
- . 2009. A simple formalism for capturing reduplication in finite-state morphology. In Proceedings of the Conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop. 207–214.Google Scholar
- . 2005. Studies on Reduplication. Number 28. Walter de Gruyter.Google Scholar
Cross Ref
- . 2005. Reduplication: Doubling in Morphology, Vol. 106. Cambridge University Press.Google Scholar
Cross Ref
- . 2019.Indic NLP Library. Retrieved from https://github.com/anoopkunchukuttan/indic_nlp_resources.git.Google Scholar
- . 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Sov. Phys. Doklady, Vol. 10. 707–710.Google Scholar
- . 1995. Faithfulness and reduplicative identity. Linguistics Department Faculty Publication Series (1995), 10.Google Scholar
- . 2011. Identification of reduplicated MWEs in Manipuri: A rule based approach. In Proceedings of 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL’10). 49–54.Google Scholar
- . 2020. Assamese reduplication identification system. Retrieved from https://github.com/anononymus/assamese-redup.Google Scholar
- . 2007. Computational Approaches to Morphology and Syntax, Vol. 4. Oxford University Press.Google Scholar
- . 2005. Reduplication: Form, function and distribution. Studies on Reduplication 28 (2005), 11–29.Google Scholar
- . 2013. Reduplication. In The World Atlas of Language Structures Online, and (Eds.). Max Planck Institute for Evolutionary Anthropology, Leipzig. Retrieved from https://wals.info/chapter/27.Google Scholar
- . 2016. Assamese Grammar and Usage: An Analytical Studies of Assamese Grammar and Usage. Chandra Prakash, Guwahati, Panbazar, Guwahati.Google Scholar
- . 2005. Reduplication in modern Hindi and the theory of reduplication. Stud. Redup.28 (2005), 263.Google Scholar
Index Terms
Reduplication in Assamese: Identification and Modeling
Recommendations
The Construction of Knowledge Base on Pre-Qin Chinese Reduplication
Chinese Lexical SemanticsAbstractReduplication is an important manifestation of Chinese morphological change. While there are numerous reduplications among polysyllabic words of Pre-Qin Chinese, there is no dedicated knowledge base on reduplication. This project designs a ...
Toward an Effective Igbo Part-of-Speech Tagger
Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments ...
A morphosyntactic Brill Tagger for inflectional languages
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processingIn this paper we present and evaluate a Brill morphosyntactic transformation-based tagger adapted for specifics of highly inflectional languages. Multi-phase tagging with grammatical category matching transformations and lexical transformations brings ...






Comments