Abstract
Sentiment analysis on social media relies on comprehending the natural language and using a robust machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The cultural miscellanies, geographically limited trending topic hash-tags, access to aboriginal language keyboards, and conversational comfort in native language compound the linguistic challenges of sentiment analysis. This research evaluates the performance of cross-lingual contextual word embeddings and zero-shot transfer learning in projecting predictions from resource-rich English to resource-poor Hindi language. The cross-lingual XLM-RoBERTa classification model is trained and fine-tuned using the English language Benchmark SemEval 2017 dataset Task 4 A and subsequently zero-shot transfer learning is used to evaluate the classification model on two Hindi sentence-level sentiment analysis datasets, namely, IITP-Movie and IITP-Product review datasets. The proposed model compares favorably to state-of-the-art approaches and gives an effective solution to sentence-level (tweet-level) analysis of sentiments in a resource-poor scenario. The proposed model compares favorably to state-of-the-art approaches and achieves an average performance accuracy of 60.93 on both the Hindi datasets.
- Bing Liu. 2012. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1 (2012), 1–167.Google Scholar
Cross Ref
- Akshi Kumar, Kathiravan Srinivasan, Wen-Huang Cheng, and Albert Y. Zomaya. 2020. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Info. Process. Manage. 57, 1 (2020), 102141.Google Scholar
Cross Ref
- Akshi Kumar and Arunima Jaiswal. 2020. A deep swarm-optimized model for leveraging industrial data analytics in cognitive manufacturing. IEEE Trans. Industr. Info. 17, 4 (2020), 2938–2946. doi: 10.1109/TII.2020.3005532Google Scholar
Cross Ref
- Santwana Chimalamarri, Dinkar Sitaram, and Ashritha Jain. 2020. Morphological segmentation to improve crosslingual word embeddings for low resource languages. ACM Trans. Asian Low-Resource Lang. Info. Process. 19, 5 (2020), 1–15. https://doi.org/10.1145/3390298 Google Scholar
Digital Library
- Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2018. Meld: A multimodal multi-party dataset for emotion recognition in conversations. Retrieved from https://arXiv:1810.02508.Google Scholar
- Md Shad Akhtar, Asif Ekbal, and Erik Cambria. 2020. How intense are you? predicting intensities of emotions and sentiments using stacked ensemble. IEEE Comput. Intell. Mag. 15, 1 (2020), 64–75.Google Scholar
Digital Library
- Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad Al-Smadi, Mahmoud Al-Ayyoub et al. 2016. Semeval-2016 task 5: Aspect-based sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 2016.Google Scholar
- Ning Liu and Bo Shen. 2020. Aspect-based sentiment analysis with gated alternate neural network. Knowl.-Based Syst. 188 (2020), 105010.Google Scholar
Digital Library
- Fazel Keshtkar and Diana Inkpen. 2009. Using sentiment orientation features for mood classification in blogs. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 1–6.Google Scholar
Cross Ref
- Mahmoud Al-Ayyoub, Abed Allah Khamaiseh, Yaser Jararweh, and Mohammed N. Al-Kabi. 2019. A comprehensive survey of arabic sentiment analysis. Info. Process. Manage. 56, 2 (2019), 320–342.Google Scholar
Cross Ref
- Majdi Beseiso and Haytham Elmousalami. 2020. Subword attentive model for Arabic sentiment analysis: A deep learning approach. ACM Trans. Asian Low-Resource Lang. Info. Process. 19, 2 (2020), 1–17. Google Scholar
Digital Library
- Asad Khattak, Muhammad Zubair Asghar, Anam Saeed, Ibrahim A. Hameed, Syed Asif Hassan, and Shakeel Ahmad. 2021. A survey on sentiment analysis in Urdu: A resource-poor language. Egypt. Info. J. 22, 1 (2021), 53–74.Google Scholar
Cross Ref
- Valentin Barriere and Alexandra Balahur. 2020. Improving sentiment analysis over non-english tweets using multilingual transformers and automatic translation for data-augmentation. Retrieved from https://arXiv:2010.03486.Google Scholar
- Wenhuan Wang, Bohan Li, Ding Feng, Anman Zhang, and Shuo Wan. 2020. The OL-DAWE Model: Tweet polarity sentiment analysis with data augmentation. IEEE Access 8 (2020), 40118–40128.Google Scholar
Cross Ref
- De Leon, Frances Adriana Laureano, Florimond Guéniat, and Harish Tayyar Madabushi. 2020. CS-embed-francesita at semeval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis. Retrieved from https://arXiv:2006.04597.Google Scholar
- Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. AI4Bharat-IndicNLP Corpus: Monolingual corpora and word embeddings for indic languages. Retrieved from https://arXiv:2005.00085.Google Scholar
- K Karthikeyan, Zihan Wang, Stephen Mayhew, and Dan Roth. 2020. Cross-lingual ability of multilingual BERT: An empirical study. In Proceedings of the International Conference on Learning Representations (ICLR’20).Google Scholar
- Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale. Retrieved from https://arXiv:1911.02116.Google Scholar
- Kumar Akshi and Geetanjali Garg. 2019. Systematic literature review on context-based sentiment analysis in social multimedia. Multimedia Tools Appl. 79, 21 (2019), 15349–15380.Google Scholar
- Jean-Benoit Delbrouck, Noé Tits, Mathilde Brousmiche, and Stéphane Dupont. 2020. A transformer-based joint-encoding for emotion recognition and sentiment analysis. Retrieved from https://arXiv:2006.15955.Google Scholar
- José Ángel González, Lluís-F. Hurtado, and Ferran Pla 2020. Self-attention for Twitter sentiment analysis in Spanish. J. Intell. Fuzzy Systems 39, 2 (2020), 2165–2175.Google Scholar
Cross Ref
- Ahmed Sultan, Mahmoud Salim, Amina Gaber, and Islam El Hosary. 2020. WESSA at SemEval-2020 Task 9: Code-mixed sentiment analysis using transformers. Retrieved from https://arXiv:2009.09879.Google Scholar
- Y Kuratov, M. Arkhipov. 2019. Adaptation of deep bidirectional multilingual transformers for Russian language. Retrieved from https://arXiv:1905.07213.Google Scholar
- Anindya Sarkar, Sujeeth Reddy, and Raghu Sesha Iyengar. 2019. Zero-shot multilingual sentiment analysis using hierarchical attentive network and BERT. In Proceedings of the 3rd International Conference on Natural Language Processing and Information Retrieval (NLPIR’19). Association for Computing Machinery, New York, NY, 49–56. DOI:https://doi.org/10.1145/3342827.3342850 Google Scholar
Digital Library
- Avishek Garain, Sainik Kumar Mahata, and Dipankar Das. 2020. JUNLP@ SemEval-2020 Task 9: Sentiment analysis of Hindi-English code mixed data using grid search cross validation. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. 1276–1280. https://arxiv.org/abs/2007.12561.Google Scholar
- Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, and Paolo Rosso. 2020. LIMSI_UPV at SemEval-2020 Task 9: Recurrent convolutional neural network for code-mixed sentiment analysis. Retrieved from https://arXiv:2008.13173.Google Scholar
- Parth Patwa, Gustavo Aguilar, Sudipta Kar, Suraj Pandey, Srinivas PYKL, Björn Gambäck, Tanmoy Chakraborty, Thamar Solorio, and Amitava Das. 2008. Semeval-2020 task 9: Overview of sentiment analysis of code-mixed tweets. Retrieved from https://arxiv.org/abs/2008.04277.Google Scholar
- Yaman Kumar, Debanjan Mahata, Sagar Aggarwal, Anmol Chugh, Rajat Maheshwari, Rajiv Ratn Shah. 2019. BHAAV—A text corpus for emotion analysis from Hindi stories. Retrieved from https://arXiv:1910.04073.Google Scholar
- Kanika Garg and D. K. Lobiyal. 2020. Hindi EmotionNet: A scalable emotion lexicon for sentiment classification of Hindi text. ACM Trans. Asian Low-Resource Lang. Info. Process. 19, 4 (2020), 1–35. Google Scholar
Digital Library
- A. Bakliwal, P. Arora, and V. Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. Int. J. Comput. Linguist. Appl. (IJCLA) 2012Google Scholar
- A Balamurali, R. Joshi, A, and P. Bhattacharyya. 2012. Cross-lingual sentiment analysis for Indian languages using linked wordnets. In Proceedings of the International Conference on Computational Linguistics (COLING’12).Google Scholar
- Braja Gopal Patra, Dipankar Das, Amitava Das, and Rajendra Prasath. 2015. Shared task on sentiment analysis in Indian languages (SAIL) tweets—An overview. In Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (MIKE’15). Springer. Google Scholar
Digital Library
- Se Shriya, R. Vinaya Kumar, M. Anand Kumar, and K. P. Soman. 2015. [email protected]: Sentiment analysis in Indian languages. In Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (MIKE’15). Springer. Google Scholar
Digital Library
- A. Kumar, S. Kohail, A. Ekbal, and C. Biemann. 2015. IIT-TUDA: System for sentiment analysis in Indian languages using lexical acquisition. In Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (MIKE’15). Springer. Google Scholar
Digital Library
- M. S. Akhtar, A. Ekbal, and P. Bhattacharyya. 2016. Aspect-based sentiment analysis in Hindi: Resource creation and sentiment classification. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’16).Google Scholar
- Shad Akhtar, Palaash Sawant, Sukanta Sen, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Solving data sparsity for aspect-based sentiment analysis using cross-linguality and multi-linguality. In Proceedings of the 16th Annual Conference of the NAACL on Human Language Technologies (HLT’18). 572–582.Google Scholar
Cross Ref
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015, Bilingual word representations with monolingual quality in mind. In Proceedings of the NAACL Workshop on Vector Space Modeling.Google Scholar
- M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 482–493.Google Scholar
- Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification? In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). 194–206Google Scholar
- Anne Lauscher, Vinit Ravishankar, Ivan Vulić, and Goran Glavaš. 2020. From zero to hero: On the limitations of zero-shot cross-lingual transfer with multilingual transformers. Retrieved from https://arXiv:2005.00633.Google Scholar
- Jie, Tao and Xing Fang. 2020. Toward multi-label sentiment analysis: A transfer learning-based approach. J. Big Data 7, 1 (2020), 1–26.Google Scholar
Cross Ref
- Sultan Ahmed, Mahmoud Salim, Amina Gaber, and Islam El Hosary. 2020. WESSA at SemEval-2020 Task 9: Code-mixed sentiment analysis using transformers. Retrieved from https://arXiv:2009.09879.Google Scholar
- Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English tweets. Retrieved from https://arXiv:2005.10200.Google Scholar
- Mathieu Cliche. 2017. BB twtr at SemEval-2017 Task 4: Twitter sentiment analysis with CNNs and LSTMs. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval’17). 573–580.Google Scholar
Cross Ref
- C. Baziotis, N. Pelekis, and C. Doulkeridis. 2017. DataStories at SemEval-2017 Task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval’17). 747–754.Google Scholar
Index Terms
Sentiment Analysis Using XLM-R Transformer and Zero-shot Transfer Learning on Resource-poor Indian Language
Recommendations
Stemming resource-poor Indian languages
Stemming is a basic method for morphological normalization of natural language texts. In this study, we focus on the problem of stemming several resource-poor languages from Eastern India, viz., Assamese, Bengali, Bishnupriya Manipuri and Bodo. While ...
Multi-label Masked Language Modeling on Zero-shot Code-switched Sentiment Analysis
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalIn multilingual communities, code-switching is a common phenomenon and code-switched tasks have become a crucial area of research in natural language processing (NLP) applications. Existing approaches mainly focus on supervised learning. However, it is ...
Investigating Metaphorical Language in Sentiment Analysis: A Sense-to-Sentiment Perspective
Intuition dictates that figurative language and especially metaphorical expressions should convey sentiment. It is the aim of this work to validate this intuition by showing that figurative language (metaphors) appearing in a sentence drive the polarity ...






Comments