Abstract
Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral. Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.
- [1] . 2020. Standardizing and benchmarking crisis-related social media datasets for humanitarian information processing. arXiv, abs/2004.06774 (2020), 1–17.Google Scholar
- [2] . 2020. Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policymakers, and the society. arXiv, abs/2005.00033 (2020), 1–39.Google Scholar
- [3] . 2007. Identifying expressions of emotion in text. In Proceedings of the 10th International Conference onText, Speech and Dialogue. Springer-Verlag, Berlin, 196–205. Google Scholar
Digital Library
- [4] . 2018. An approach towards multilingual translation bysemantic-based verb identification and root word analysis. In Proceedings of the 2018 5th International Conference on Networking, Systems and Security. IEEE, 1–9.
DOI : https://doi.org/10.1109/NSysS.2018.8631383Google Scholar - [5] . 2016. Opinion mining and sentiment analysis. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development. IEEE, 452–455.Google Scholar
- [6] . 2019. Survey on text-based sentiment analysis of bengali language. In Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology. IEEE, 1–6.Google Scholar
Cross Ref
- [7] . 2017. Assessing state-of-the-art sentiment models on state-of-the-art sentiment datasets. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, 2–12.Google Scholar
Cross Ref
- [8] . 2015. Developing corpora for sentiment analysis: The Case of Irony and Senti-TUT. In Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 4158–4162. Google Scholar
Digital Library
- [9] . 2014. Performing sentiment analysis in Bangla microblog posts. In Proceedings of the 2014 International Conference on Informatics, Electronics Vision. IEEE, 1–6.Google Scholar
Cross Ref
- [10] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 4171–4186.
DOI : https://doi.org/10.18653/v1/N19-1423Google Scholar - [11] . 2016. A web-based tool for the integrated annotation of semantic and syntactic structures. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities. The COLING 2016 Organizing Committee, 76–84.Google Scholar
- [12] . 2019. Sentiment analysis of bengali online reviews written with english letter using machine learning approaches. In Proceedings of the 6th International Conference on Networking, Systems and Security. Association for Computing Machinery, 109–115. Google Scholar
Digital Library
- [13] . 2006. SENTIWORDNET: A publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation. European Language Resources Association.Google Scholar
- [14] . 2011. Adapting rule based machine translation from english to bangla. Indian Journal of Computer Science and Engineering 2, 3 (2011), 334–342.Google Scholar
- [15] . 1998. Rhetorical questions, relevance and scales. Revista Alicantina De Estudios Ingleses, 11, 11 (1998), 139–155.Google Scholar
Cross Ref
- [16] . 2020. Emotion recognition from bengali speech using RNN modulation-based categorization. In Proceedings of the 2020 3rd International Conference on Smart Systems and Inventive Technology. IEEE, 1131–1136.Google Scholar
Cross Ref
- [17] . 2016. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv, abs/1606.08415 (2016), 1–7.Google Scholar
- [18] . 2004. Mining and summarizing customer reviews. InProceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 168–177. Google Scholar
Digital Library
- [19] . 2004. Mining and Summarizing Customer Reviews. InProceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 168–177. Google Scholar
Digital Library
- [20] . 2016. Polygot: Going beyond database driven and syntax-based translation. Association for Computing Machinery, New York, NY, Article 28, 4 pages.Google Scholar
- [21] . 2021. Towards achieving a delicate blending between rule-based translator and neural machine translator. Neural Computing and Applications 33, 18 (2021), 1–27.Google Scholar
- [22] . 2013. A survey on sentiment analysis and opinion mining techniques. Journal of Emerging Technologies in Web Intelligence 5, 4 (2013), 367–371.Google Scholar
Cross Ref
- [23] . 2020. Predicting users’ movie preference and rating behavior from personality and values. ACM Transactions on Interactive Intelligent Systems 10, 3 (2020), 1–25. Google Scholar
Digital Library
- [24] . 2016. Capturing reliable fine-grained sentiment associations by crowdsourcing and best–worst scaling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 811–817.Google Scholar
Cross Ref
- [25] . 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1–167.Google Scholar
Cross Ref
- [26] . 2012. A Survey of opinion mining and sentiment analysis. Mining Text Data. Springer, 415–463.Google Scholar
Cross Ref
- [27] . 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv, abs/1907.11692 (2019), 1–13.Google Scholar
- [28] . 2017. Fixing weight decay regularization in adam. arXiv, abs/1711.05101 (2017), 1–14.Google Scholar
- [29] . 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 142–150. Google Scholar
Digital Library
- [30] . 2019. Annotating evaluative sentences for sentiment analysis: A dataset for Norwegian. In Proceedings of the 22nd Nordic Conference on Computational Linguistics Linköping Electronic Conference. Linköping University Electronic Press, 121–130.Google Scholar
- [31] . 2014. The stanford core NLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 55–60.Google Scholar
Cross Ref
- [32] . 2012. Interrater reliability: The kappa statistic. Biochemia Medica 22, 3 (2012), 276–282.Google Scholar
Cross Ref
- [33] . 2016. A practical guide to sentiment annotation: Challenges and solutions. In Proceedings of the7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, 174–179.Google Scholar
Cross Ref
- [34] . 2013. Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29, 3 (2013), 436–465.Google Scholar
Cross Ref
- [35] . 2015. Sentiment, emotion, purpose, and style in electoral tweets. Information Processing and Management 51, 4 (
July 2015), 480–499. Google ScholarDigital Library
- [36] . 2016. Multilingual twitter sentiment classification: The role of human annotators. PLOS ONE 11, 5 (2016), 1–26.Google Scholar
- [37] . 2016. Identifying and validating personality traits-based homophilies for an egocentric network. Social Network Analysis and Mining 6, 1 (2016), 1–16.Google Scholar
Cross Ref
- [38] . 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3–26.Google Scholar
Cross Ref
- [39] . 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the 7th International Conference on Language Resources and Evaluation. European Language Resources Association.Google Scholar
- [40] . 2019. How multilingual is Multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Google Scholar
Cross Ref
- [41] . 2017. SemEval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 502–518.Google Scholar
Cross Ref
- [42] . 2015. Business reviews classification using sentiment analysis. In Proceedings of the 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. IEEE, 247–250. Google Scholar
Digital Library
- [43] . 2020. BanglaBERT: Bengali mask language model for bengali language understading. Retrieved 26 April, 2021 from https://github.com/sagorbrur/bangla-bert.Google Scholar
- [44] . 2019. Detecting indicators for startup business success: Sentiment analysis using text data mining. Sustainability 11, 3 (2019), 917.Google Scholar
Cross Ref
- [45] . 2018. Sentiment classification towards question-answering with hierarchical matching network. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3654–3663.Google Scholar
Cross Ref
- [46] . 2011. Twitter polarity classification with label propagation over lexical links and the follower graph. In Proceedings of the 1st Workshop on Unsupervised Learning in NLP. Association for Computational Linguistics, 53–63. Google Scholar
Digital Library
- [47] . 1963. A computer approach to content analysis: Studies using the general inquirer system. In Proceedings of the May 21–23, 1963, Spring Joint Computer Conference. Association for Computing Machinery, 241–256. Google Scholar
Digital Library
- [48] . 2014. Building large-scale twitter-specific sentiment lexicon: A representation learning approach. In Proceedings of the COLING 2014, 25th International Conference on Computational Linguistics. Dublin City University and Association for Computational Linguistics, 172–182.Google Scholar
- [49] . 2010. Sentence and expression level annotation of opinions in user-generated discourse. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 575–584. Google Scholar
Digital Library
- [50] . 2019. An automated system of sentiment analysis from Bangla text using supervised learning techniques. In Proceedings of the 2019 IEEE 4th International Conference on Computerand Communication Systems. IEEE, 360–364.Google Scholar
Cross Ref
- [51] . 2015. The good, the bad and the implicit: A comprehensive approach to annotating explicit and implicit sentiment. Language Resources and Evaluation 49, 3 (2015), 685–720. Google Scholar
Digital Library
- [52] . 2012. Sentiment analysis and opinion mining: A survey. International Journal of Advanced Research in Computer Science and Software Engineering 2, 6 (2012), 282–292.Google Scholar
- [53] . 2013. A depression detection model based on sentiment analysis in micro-blog social network. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining: Trends and Applications in Knowledge Discovery and Data Mining, Vol. 7867. Lecture Notes in Computer Science, Springer, 201–213. Google Scholar
Digital Library
- [54] . 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 347–354. Google Scholar
Digital Library
- [55] . 2016. Measuring inter-rater reliability for nominal data–which coefficients and confidence intervals are appropriate?BMC Medical Research Methodology 16, 1 (2016), 1–10.Google Scholar
Cross Ref
Index Terms
A Comprehensive Guideline for Bengali Sentiment Annotation
Recommendations
Extracting domain-specific opinion words for sentiment analysis
MICAI'12: Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part IIIn this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, ...
An automatic non-English sentiment lexicon builder using unannotated corpus
Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages ...
Automatic Construction of Domain Specific Sentiment Lexicons for Hungarian
TSD 2015: Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302Sentiment analysis has become an actively researched area recently, which aims to detect positive and negative opinions in texts. A good indicator for the polarity of a given text is the number of words in it that have positive or negative meanings. The ...






Comments