Abstract
Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter.
Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.
- S. Singh, K. Gupta, M. Shrivastava, and P. Bhattacharyya. 2006. Morphological richness offsets resource demand-experiences in constructing a pos tagger for Hindi. In Proceedings of the International Conference on Computational Linguistics (COLING’06). Association for Computational Linguistics, 779–786. Google Scholar
Digital Library
- V. Jha, N. Manjunath, P. D. Shenoy, K. R. Venugopal, and L. M. Patnaik. 2015. Homs: Hindi opinion mining system. In Proceedings of the IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS’15). IEEE, 366–371.Google Scholar
- V. Gupta, V. K. Singh, P. Mukhija, and U. Ghose. 2019. Aspect-based sentiment analysis of mobile reviews. J. Intell. Fuzzy Syst. 36, 5 (2019), 4721–4730.Google Scholar
Cross Ref
- R. Piryani, V. Gupta, V. K. Singh, and U. Ghose. 2017. A linguistic rule-based approach for aspect-level sentiment analysis of movie reviews. In Advances in Computer and Computational Sciences. Springer, Singapore, 201–109.Google Scholar
- R. Piryani, V. Gupta, and V. K. Singh. 2017. Movie prism: A novel system for aspect level sentiment profiling of movies. J. Intell. Fuzzy Syst. 32, 5 (2017), 3297–331Google Scholar
Cross Ref
- V. Gupta, N. Jain, P. Katariya, A. Kumar, S. Mohan, A. Ahmadian, and M. Ferrara. 2021. An emotion care model using multimodal textual analysis on COVID-19. Chaos, Solitons Fractals 144 (2021), 110708.Google Scholar
Cross Ref
- B. R. Ambati, S. Husain, S. Jain, D. M. Sharma, and R. Sangal. 2010. Two methods to incorporate local morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 1st Workshop on Statistical Parsing of Morphologically Rich Languages. Association for Computational Linguistics, 22–30. Google Scholar
Digital Library
- A. Joshi, A. R. Balamurali, and P. Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: a case study. Proceedings of the 8th International Conference on Natural Language Processing (ICON’10).Google Scholar
- A. Karthikeyan. 2010, May. Hindi English Wordnet Linkage. Dual-degree thesis, CSE Dept. IIT Bombay.Google Scholar
- A. Bakliwal, P. Arora, A. Patil, and V. Varma. 2011. Towards enhanced opinion classification using NLP techniques. In Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP’11). 101–107.Google Scholar
- A. Bakliwal, P. Arora, and V. Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 1189–1196.Google Scholar
- P. Arora, A. Bakliwal, and V. Varma. 2012. Hindi subjective lexicon generation using WordNet graph traversal. International J. Comput. Linguist. Appl. 3, 1 (2012), 25–39.Google Scholar
- S. Mukherjee and P. Bhattacharyya. 2012. Sentiment analysis in Twitter with lightweight discourse analysis. In Proceedings of the International Conference on Computational Linguistics (COLING’12). 1847–1864.Google Scholar
- N. Mittal, B. Agarwal, G. Chouhan, N. Bania, and P. Pareek. 2013. Sentiment analysis of Hindi reviews based on negation and discourse relation. In Proceedings of the 11th Workshop on Asian Language Resources. 45–50.Google Scholar
- R. Sharma, S. Nigam, and R. Jain. 2014. Polarity detection of Movie Review in Hindi Language. In Int. J. Comput. Sci. Appl. 4, 4 (2014), 49–57.Google Scholar
Cross Ref
- K. Ravi and V. Ravi. 2016. Sentiment classification of Hinglish text. In Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT’16). IEEE, 641–645.Google Scholar
- M. Z. Ansari, T. Ahmad, and M. A. Ali. 2018. Cross script Hindi-English NER corpus from Wikipedia. In Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things. Springer, Cham, 1006–1012.Google Scholar
- R. Piryani, V. Gupta, and V. K. Singh. 2018. Generating aspect-based extractive opinion summary: Drawing inferences from social media texts. Comput. Sistem. 22, 1 (2018), 83–91.Google Scholar
- R. Jain, N. Jain, A. Aggarwal, and D. J. Hemanth. 2019. Convolutional neural network-based Alzheimer's disease classification from magnetic resonance brain images. Cogn. Syst. Res. 57, 147–159.Google Scholar
Digital Library
- V. Gupta, S. Juyal, G. P. Singh, C. Killa, and N. Gupta. 2020. Emotion recognition of audio/speech data using deep learning approaches. J. Info. Optimiz. Sci. 41, 6 (2020), 1309–1317.Google Scholar
- N. Jain, A. Chauhan, P. Tripathi, S. B. Moosa, P. Aggarwal, and B. Oznacar. 2020. Cell image analysis for malaria detection using deep convolutional network. Intell. Decis. Technol. 14, 1 (2020), 55–65.Google Scholar
Cross Ref
- D. Gupta, A. Ekbal, and P. Bhattacharyya. 2019. A deep neural network framework for english hindi question answering. ACM Trans. Asian Low-Res. Lang. Info. Process. 19, 2 (2019), 1–22. Google Scholar
Digital Library
- M. Tummalapalli, M. Chinnakotla, and R. Mamidi. 2018, March. Towards better sentence classification for morphologically rich languages. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.Google Scholar
- M. Singh, R. Kumar, and I. Chana. 2020. Corpus-based machine translation system with deep neural network for Sanskrit to Hindi translation. Procedia Comput. Sci. 167, 2534–2544.Google Scholar
Cross Ref
- M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016, December. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 482–493.Google Scholar
- L. Rolling. 1981. Indexing consistency, quality, and efficiency. Info. Process. Manage. 17, 2 (1981), 69–76.Google Scholar
Cross Ref
- T. Byrt. 1996. How good is that agreement? Epidemiology 7, 5 (1996), 561.Google Scholar
Cross Ref
- N. Jain, S. Jhunthra, H. Garg, V. Gupta, S. Mohan, A. Ahmadian, S. Salahshour, and M. Ferrara. 2021. Prediction Modelling of COVID using Machine Learning methods from B-Cell dataset. Results Phys. 21 (2021), 103813.Google Scholar
Cross Ref
- Y. Duan, L. Jiang, T. Qin, M. Zhou, and H. Y. Shum. 2010. An empirical study on learning to the rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 295–303. Google Scholar
Digital Library
- R. McCreadie and C. Macdonald. 2013. Relevance in microblogs: Enhancing tweet retrieval using hyperlinked documents. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval. Le Centre de Hautes Etudes Internationales D'informatique Documentaire, 189–196. Google Scholar
Digital Library
- J. Vosecky, K. W. T. Leung, and W. Ng. 2012. Searching for quality microblog posts: Filtering and ranking based on content analysis and implicit links. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, Berlin, 397–413. Google Scholar
Digital Library
- S. Mohammad. 2011. From once upon a time to happily ever after: Tracking emotions in novels and fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, 105–114. Google Scholar
Digital Library
- S. M. Mohammad and P. D. Turney. 2013. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 29, 3 (2013), 436–465.Google Scholar
Cross Ref
- D. Jain, A. Kumar, and G. Garg. 2020. Sarcasm detection in mash-up language using soft-attention-based bi-directional LSTM and feature-rich CNN. Appl. Soft Comput. 91 (2020), 106198.Google Scholar
Cross Ref
- S. Seshadri, A. K. Madasamy, S. K. Padannayil, and M. A. Kumar. 2016. Analyzing sentiment in Indian languages micro text using a recurrent neural network. Inst. Integr. Omics Appl. Biotechnol. J. 7 (2016), 313–318.Google Scholar
Index Terms
Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi
Recommendations
Hindi EmotionNet: A Scalable Emotion Lexicon for Sentiment Classification of Hindi Text
In this study, we create an emotion lexicon for the Hindi language called Hindi EmotionNet. It can assign emotional affinity to words in IndoWordNet. This lexicon contains 3,839 emotion words, with 1,246 positive and 2,399 negative words. We also ...
Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular PapersA novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, ...
A Word-Character Convolutional Neural Network for Language-Agnostic Twitter Sentiment Analysis
ADCS '17: Proceedings of the 22nd Australasian Document Computing SymposiumConvolutional Neural Networks (CNN) have been widely used for text classification. Both word-based CNNs and character-based CNNs have shown good performance for Twitter sentiment classification. Most research on CNNs is towards English Twitter sentiment ...






Comments