skip to main content
research-article

Sentiment Analysis Using XLM-R Transformer and Zero-shot Transfer Learning on Resource-poor Indian Language

Authors Info & Claims
Published:30 June 2021Publication History
Skip Abstract Section

Abstract

Sentiment analysis on social media relies on comprehending the natural language and using a robust machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The cultural miscellanies, geographically limited trending topic hash-tags, access to aboriginal language keyboards, and conversational comfort in native language compound the linguistic challenges of sentiment analysis. This research evaluates the performance of cross-lingual contextual word embeddings and zero-shot transfer learning in projecting predictions from resource-rich English to resource-poor Hindi language. The cross-lingual XLM-RoBERTa classification model is trained and fine-tuned using the English language Benchmark SemEval 2017 dataset Task 4 A and subsequently zero-shot transfer learning is used to evaluate the classification model on two Hindi sentence-level sentiment analysis datasets, namely, IITP-Movie and IITP-Product review datasets. The proposed model compares favorably to state-of-the-art approaches and gives an effective solution to sentence-level (tweet-level) analysis of sentiments in a resource-poor scenario. The proposed model compares favorably to state-of-the-art approaches and achieves an average performance accuracy of 60.93 on both the Hindi datasets.

References

  1. Bing Liu. 2012. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1 (2012), 1–167.Google ScholarGoogle ScholarCross RefCross Ref
  2. Akshi Kumar, Kathiravan Srinivasan, Wen-Huang Cheng, and Albert Y. Zomaya. 2020. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Info. Process. Manage. 57, 1 (2020), 102141.Google ScholarGoogle ScholarCross RefCross Ref
  3. Akshi Kumar and Arunima Jaiswal. 2020. A deep swarm-optimized model for leveraging industrial data analytics in cognitive manufacturing. IEEE Trans. Industr. Info. 17, 4 (2020), 2938–2946. doi: 10.1109/TII.2020.3005532Google ScholarGoogle ScholarCross RefCross Ref
  4. Santwana Chimalamarri, Dinkar Sitaram, and Ashritha Jain. 2020. Morphological segmentation to improve crosslingual word embeddings for low resource languages. ACM Trans. Asian Low-Resource Lang. Info. Process. 19, 5 (2020), 1–15. https://doi.org/10.1145/3390298 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2018. Meld: A multimodal multi-party dataset for emotion recognition in conversations. Retrieved from https://arXiv:1810.02508.Google ScholarGoogle Scholar
  6. Md Shad Akhtar, Asif Ekbal, and Erik Cambria. 2020. How intense are you? predicting intensities of emotions and sentiments using stacked ensemble. IEEE Comput. Intell. Mag. 15, 1 (2020), 64–75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad Al-Smadi, Mahmoud Al-Ayyoub et al. 2016. Semeval-2016 task 5: Aspect-based sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 2016.Google ScholarGoogle Scholar
  8. Ning Liu and Bo Shen. 2020. Aspect-based sentiment analysis with gated alternate neural network. Knowl.-Based Syst. 188 (2020), 105010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fazel Keshtkar and Diana Inkpen. 2009. Using sentiment orientation features for mood classification in blogs. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  10. Mahmoud Al-Ayyoub, Abed Allah Khamaiseh, Yaser Jararweh, and Mohammed N. Al-Kabi. 2019. A comprehensive survey of arabic sentiment analysis. Info. Process. Manage. 56, 2 (2019), 320–342.Google ScholarGoogle ScholarCross RefCross Ref
  11. Majdi Beseiso and Haytham Elmousalami. 2020. Subword attentive model for Arabic sentiment analysis: A deep learning approach. ACM Trans. Asian Low-Resource Lang. Info. Process. 19, 2 (2020), 1–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Asad Khattak, Muhammad Zubair Asghar, Anam Saeed, Ibrahim A. Hameed, Syed Asif Hassan, and Shakeel Ahmad. 2021. A survey on sentiment analysis in Urdu: A resource-poor language. Egypt. Info. J. 22, 1 (2021), 53–74.Google ScholarGoogle ScholarCross RefCross Ref
  13. Valentin Barriere and Alexandra Balahur. 2020. Improving sentiment analysis over non-english tweets using multilingual transformers and automatic translation for data-augmentation. Retrieved from https://arXiv:2010.03486.Google ScholarGoogle Scholar
  14. Wenhuan Wang, Bohan Li, Ding Feng, Anman Zhang, and Shuo Wan. 2020. The OL-DAWE Model: Tweet polarity sentiment analysis with data augmentation. IEEE Access 8 (2020), 40118–40128.Google ScholarGoogle ScholarCross RefCross Ref
  15. De Leon, Frances Adriana Laureano, Florimond Guéniat, and Harish Tayyar Madabushi. 2020. CS-embed-francesita at semeval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis. Retrieved from https://arXiv:2006.04597.Google ScholarGoogle Scholar
  16. Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. AI4Bharat-IndicNLP Corpus: Monolingual corpora and word embeddings for indic languages. Retrieved from https://arXiv:2005.00085.Google ScholarGoogle Scholar
  17. K Karthikeyan, Zihan Wang, Stephen Mayhew, and Dan Roth. 2020. Cross-lingual ability of multilingual BERT: An empirical study. In Proceedings of the International Conference on Learning Representations (ICLR’20).Google ScholarGoogle Scholar
  18. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale. Retrieved from https://arXiv:1911.02116.Google ScholarGoogle Scholar
  19. Kumar Akshi and Geetanjali Garg. 2019. Systematic literature review on context-based sentiment analysis in social multimedia. Multimedia Tools Appl. 79, 21 (2019), 15349–15380.Google ScholarGoogle Scholar
  20. Jean-Benoit Delbrouck, Noé Tits, Mathilde Brousmiche, and Stéphane Dupont. 2020. A transformer-based joint-encoding for emotion recognition and sentiment analysis. Retrieved from https://arXiv:2006.15955.Google ScholarGoogle Scholar
  21. José Ángel González, Lluís-F. Hurtado, and Ferran Pla 2020. Self-attention for Twitter sentiment analysis in Spanish. J. Intell. Fuzzy Systems 39, 2 (2020), 2165–2175.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ahmed Sultan, Mahmoud Salim, Amina Gaber, and Islam El Hosary. 2020. WESSA at SemEval-2020 Task 9: Code-mixed sentiment analysis using transformers. Retrieved from https://arXiv:2009.09879.Google ScholarGoogle Scholar
  23. Y Kuratov, M. Arkhipov. 2019. Adaptation of deep bidirectional multilingual transformers for Russian language. Retrieved from https://arXiv:1905.07213.Google ScholarGoogle Scholar
  24. Anindya Sarkar, Sujeeth Reddy, and Raghu Sesha Iyengar. 2019. Zero-shot multilingual sentiment analysis using hierarchical attentive network and BERT. In Proceedings of the 3rd International Conference on Natural Language Processing and Information Retrieval (NLPIR’19). Association for Computing Machinery, New York, NY, 49–56. DOI:https://doi.org/10.1145/3342827.3342850 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Avishek Garain, Sainik Kumar Mahata, and Dipankar Das. 2020. JUNLP@ SemEval-2020 Task 9: Sentiment analysis of Hindi-English code mixed data using grid search cross validation. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. 1276–1280. https://arxiv.org/abs/2007.12561.Google ScholarGoogle Scholar
  26. Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, and Paolo Rosso. 2020. LIMSI_UPV at SemEval-2020 Task 9: Recurrent convolutional neural network for code-mixed sentiment analysis. Retrieved from https://arXiv:2008.13173.Google ScholarGoogle Scholar
  27. Parth Patwa, Gustavo Aguilar, Sudipta Kar, Suraj Pandey, Srinivas PYKL, Björn Gambäck, Tanmoy Chakraborty, Thamar Solorio, and Amitava Das. 2008. Semeval-2020 task 9: Overview of sentiment analysis of code-mixed tweets. Retrieved from https://arxiv.org/abs/2008.04277.Google ScholarGoogle Scholar
  28. Yaman Kumar, Debanjan Mahata, Sagar Aggarwal, Anmol Chugh, Rajat Maheshwari, Rajiv Ratn Shah. 2019. BHAAV—A text corpus for emotion analysis from Hindi stories. Retrieved from https://arXiv:1910.04073.Google ScholarGoogle Scholar
  29. Kanika Garg and D. K. Lobiyal. 2020. Hindi EmotionNet: A scalable emotion lexicon for sentiment classification of Hindi text. ACM Trans. Asian Low-Resource Lang. Info. Process. 19, 4 (2020), 1–35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Bakliwal, P. Arora, and V. Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. Int. J. Comput. Linguist. Appl. (IJCLA) 2012Google ScholarGoogle Scholar
  31. A Balamurali, R. Joshi, A, and P. Bhattacharyya. 2012. Cross-lingual sentiment analysis for Indian languages using linked wordnets. In Proceedings of the International Conference on Computational Linguistics (COLING’12).Google ScholarGoogle Scholar
  32. Braja Gopal Patra, Dipankar Das, Amitava Das, and Rajendra Prasath. 2015. Shared task on sentiment analysis in Indian languages (SAIL) tweets—An overview. In Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (MIKE’15). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Se Shriya, R. Vinaya Kumar, M. Anand Kumar, and K. P. Soman. 2015. [email protected]: Sentiment analysis in Indian languages. In Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (MIKE’15). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Kumar, S. Kohail, A. Ekbal, and C. Biemann. 2015. IIT-TUDA: System for sentiment analysis in Indian languages using lexical acquisition. In Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (MIKE’15). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. S. Akhtar, A. Ekbal, and P. Bhattacharyya. 2016. Aspect-based sentiment analysis in Hindi: Resource creation and sentiment classification. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’16).Google ScholarGoogle Scholar
  36. Shad Akhtar, Palaash Sawant, Sukanta Sen, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Solving data sparsity for aspect-based sentiment analysis using cross-linguality and multi-linguality. In Proceedings of the 16th Annual Conference of the NAACL on Human Language Technologies (HLT’18). 572–582.Google ScholarGoogle ScholarCross RefCross Ref
  37. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015, Bilingual word representations with monolingual quality in mind. In Proceedings of the NAACL Workshop on Vector Space Modeling.Google ScholarGoogle Scholar
  38. M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 482–493.Google ScholarGoogle Scholar
  39. Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification? In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). 194–206Google ScholarGoogle Scholar
  40. Anne Lauscher, Vinit Ravishankar, Ivan Vulić, and Goran Glavaš. 2020. From zero to hero: On the limitations of zero-shot cross-lingual transfer with multilingual transformers. Retrieved from https://arXiv:2005.00633.Google ScholarGoogle Scholar
  41. Jie, Tao and Xing Fang. 2020. Toward multi-label sentiment analysis: A transfer learning-based approach. J. Big Data 7, 1 (2020), 1–26.Google ScholarGoogle ScholarCross RefCross Ref
  42. Sultan Ahmed, Mahmoud Salim, Amina Gaber, and Islam El Hosary. 2020. WESSA at SemEval-2020 Task 9: Code-mixed sentiment analysis using transformers. Retrieved from https://arXiv:2009.09879.Google ScholarGoogle Scholar
  43. Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English tweets. Retrieved from https://arXiv:2005.10200.Google ScholarGoogle Scholar
  44. Mathieu Cliche. 2017. BB twtr at SemEval-2017 Task 4: Twitter sentiment analysis with CNNs and LSTMs. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval’17). 573–580.Google ScholarGoogle ScholarCross RefCross Ref
  45. C. Baziotis, N. Pelekis, and C. Doulkeridis. 2017. DataStories at SemEval-2017 Task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval’17). 747–754.Google ScholarGoogle Scholar

Index Terms

  1. Sentiment Analysis Using XLM-R Transformer and Zero-shot Transfer Learning on Resource-poor Indian Language

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 5
        September 2021
        320 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3467024
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 June 2021
        • Accepted: 1 April 2021
        • Revised: 1 March 2021
        • Received: 1 October 2020
        Published in tallip Volume 20, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!