Abstract
Sentiment Analysis (SA) is one of the most active research areas in the Natural Language Processing (NLP) field due to its potential for business and society. With the development of language representation models, numerous methods have shown promising efficiency in fine-tuning pre-trained language models in NLP downstream tasks. For Vietnamese, many available pre-trained language models were also released, including the monolingual and multilingual language models. Unfortunately, all of these models were trained on different architectures, pre-trained data, and pre-processing steps; consequently, fine-tuning these models can be expected to yield different effectiveness. In addition, there is no study focusing on evaluating the performance of these models on the same datasets for the SA task up to now. This article presents a fine-tuning approach to investigate the performance of different pre-trained language models for the Vietnamese SA task. The experimental results show the superior performance of the monolingual PhoBERT model and ViT5 model in comparison with previous studies and provide new state-of-the-art performances on five benchmark Vietnamese SA datasets. To the best of our knowledge, our study is the first attempt to investigate the performance of fine-tuning Transformer-based models on five datasets with different domains and sizes for the Vietnamese SA task.
- [1] . 2021. A comparative study of effective approaches for Arabic sentiment analysis. Inf. Process. Manage. 58, 2 (2021), 102438. Google Scholar
Cross Ref
- [2] . 2021. Deep learning and multilingual sentiment analysis on social media data: An overview. Appl. Soft Comput. 107 (2021), 107373. Google Scholar
Cross Ref
- [3] . 2019. A comprehensive survey of arabic sentiment analysis. Inf. Process. Manage. 56, 2 (2019), 320–342.Google Scholar
Cross Ref
- [4] . 2015. Leveraging user ratings for resource-poor sentiment classification. Proc. Comput. Sci. 60 (2015), 322–331.Google Scholar
Cross Ref
- [5] . 2015. Vietnamese sentiment analysis based on term feature selection approach. In Proceedings of the 10th International Conference on Knowledge Information and Creativity Support Systems. Springer, 196–204.Google Scholar
- [6] . 2020. Longformer: The long-document transformer. CoRR abs/2004.05150 (2020). arXiv:2004.05150 https://arxiv.org/abs/2004.05150Google Scholar
- [7] . 2021. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Bas. Syst. 226 (2021), 107134. Google Scholar
Cross Ref
- [8] . 2010. The balanced accuracy and its posterior distribution. In Proceedings of the 20th International Conference on Pattern Recognition. IEEE, 3121–3124. Google Scholar
Digital Library
- [9] . 2020. Improving sequence tagging for vietnamese text using transformer-based neural models. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation. Association for Computational Linguistics, Hanoi, Vietnam, 13–20.Google Scholar
- [10] . 2020. Pre-training transformers as energy-based cloze models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 285–294. Google Scholar
Cross Ref
- [11] . 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 8440–8451. Google Scholar
Cross Ref
- [12] . 2019. A transformation method for aspect-based sentiment analysis. J. Comput. Sci. Cybernet. 34, 4 (2019), 323–333.Google Scholar
Cross Ref
- [13] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. Google Scholar
Cross Ref
- [14] . 2020. CogLTX: Applying BERT to long texts. In Advances in Neural Information Processing Systems, , , , , and (Eds.), Vol. 33. Curran Associates, Inc., 12792–12804.Google Scholar
- [15] . 2020. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping.
arXiv:2002.06305 . Retrieved from https://arxiv.org/abs/2002.06305.Google Scholar - [16] . 2021. A review: Preprocessing techniques and data augmentation for sentiment analysis. Comput. Soc. Netw. 8, 1 (2021), 1–16.Google Scholar
Cross Ref
- [17] . 2014. An empirical study on sentiment analysis for vietnamese. In Proceedings of the International Conference on Advanced Technologies for Communications. IEEE, 309–314. Google Scholar
Cross Ref
- [18] . 2021. A survey of data augmentation approaches for NLP. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP’21). Association for Computational Linguistics, 968–988. Google Scholar
Cross Ref
- [19] . 2019. Target-dependent sentiment classification with BERT. IEEE Access 7 (2019), 154290–154299. Google Scholar
Cross Ref
- [20] . 2021. TWilBert: Pre-trained deep bidirectional transformers for spanish twitter. Neurocomputing 426 (2021), 58–69.Google Scholar
Cross Ref
- [21] . 2020. Metrics for multi-class classification: An overview.
arxiv:2008.05756 . Retrieved from https://arxiv.org/abs/2008.05756.Google Scholar - [22] . 2021. Parameter-efficient transfer learning with diff pruning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 4884–4896. Google Scholar
Cross Ref
- [23] . 2011. An upgrading feature-based opinion mining model on vietnamese product reviews. In Active Media Technology, , , , and (Eds.). Springer, Berlin, 173–185.Google Scholar
Cross Ref
- [24] . 2016. Lifelong learning for cross-domain vietnamese sentiment classification. In Computational Social Networks, and (Eds.). Springer International Publishing, Cham, 298–308.Google Scholar
Cross Ref
- [25] . 2020. Emotion recognition for vietnamese social media text. In Computational Linguistics, , , , and (Eds.). Springer, Singapore, Singapore, 319–333.Google Scholar
- [26] . 2019. An efficient model for sentiment analysis of electronic product reviews in vietnamese. In Future Data and Security Engineering, , , , and (Eds.). Springer International Publishing, Cham, 132–142.Google Scholar
- [27] . 2019. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), and (Eds.), Vol. 97. PMLR, 2790–2799.Google Scholar
- [28] . 2020. Effective vietnamese sentiment analysis model using sentiment word embedding and transfer learning. In Data Science, , , , and (Eds.). Springer, Singapore, Singapore, 36–46.Google Scholar
- [29] . 2020. A data augmentation technique based on text for vietnamese sentiment analysis. In Proceedings of the 11th International Conference on Advances in Information Technology (IAIT2020). Association for Computing Machinery, New York, NY, Article
13 , 5 pages. Google ScholarDigital Library
- [30] . 2020. A simple and efficient ensemble classifier combining multiple neural network models on social media datasets in vietnamese. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation. Association for Computational Linguistics, 420–429.Google Scholar
- [31] . 2005. An Introduction to Sentiment Analysis. GlobalLogic Inc.Google Scholar
- [32] . 2019. Deep learning application to ensemble learning-the simple, but effective, approach to sentiment classifying. Appl. Sci. 9, 13 (2019). Google Scholar
Cross Ref
- [33] . 2010. Sentiment analysis for vietnamese. In Proceedings of the 2nd International Conference on Knowledge and Systems Engineering. IEEE, 152–157. Google Scholar
Digital Library
- [34] . 2020. A multi-filter BiLSTM-CNN architecture for vietnamese sentiment analysis. In Advances in Computational Collective Intelligence, , , and (Eds.). Springer International Publishing, Cham, 752–763.Google Scholar
- [35] . 2021. Sentiment analysis of chinese stock reviews based on BERT model. Appl. Intell. 51, 7 (2021), 1–9. Google Scholar
Digital Library
- [36] . 2021. Applying BERT to analyze investor sentiment in stock market. Neural Comput. Appl. 33, 10 (2021), 4663–4676.Google Scholar
Digital Library
- [37] . 2012. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1 (2012), 1–167.Google Scholar
Cross Ref
- [38] . 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv e-prints, arXiv.1907.Google Scholar
- [39] . 2019. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19, New Orleans, LA, USA, May 6-9, 2019). OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7.Google Scholar
- [40] . 2020. Empirical study of text augmentation on social media text in vietnamese. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation. Association for Computational Linguistics, 462–470.Google Scholar
- [41] . 2019. Fine-grained sentiment classification using BERT. In Proceedings of the Artificial Intelligence for Transforming Business and Society (AITB’19), Vol. 1. IEEE, 1–5. Google Scholar
Cross Ref
- [42] . 2014. Domain specific sentiment dictionary for opinion mining of vietnamese text. In Multi-disciplinary Trends in Artificial Intelligence. Springer International Publishing, Cham, 136–148.Google Scholar
- [43] . 2020. Knowledge innovation through Intelligent software methodologies, tools and techniques. In An Efficient Framework for Vietnamese Sentiment Classification, Vol. 327. IOS Press, 343–354. Google Scholar
Cross Ref
- [44] . 2020. PhoBERT: Pre-trained language models for vietnamese. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). Association for Computational Linguistics, 1037–1042. Google Scholar
Cross Ref
- [45] . 2019. VLSP shared task: Sentiment analysis. J. Comput. Sci. Cybernet. 34, 4 (2019), 295–310. Google Scholar
Cross Ref
- [46] . 2018. An ensemble of shallow and deep learning algorithms for vietnamese sentiment analysis. In Proceedings of the 5th NAFOSTED Conference on Information and Computer Science. IEEE, 165–170. Google Scholar
Cross Ref
- [47] . 2020. Language-oriented sentiment analysis based on the grammar structure and improved self-attention network. In Proceedings of the Evaluation of Novel Approaches to Software Engineering (ENASE’20). 339–346.Google Scholar
Cross Ref
- [48] . 2020. Exploiting vietnamese social media characteristics for textual emotion recognition in vietnamese. In Proceedings of the International Conference on Asian Language Processing. IEEE, 276–281. Google Scholar
Cross Ref
- [49] . 2018. UIT-VSFC: Vietnamese students’ feedback corpus for sentiment analysis. In Proceedings of the 10th International Conference on Knowledge and Systems Engineering. IEEE, 19–24. Google Scholar
Cross Ref
- [50] . 2018. Deep learning versus traditional classifiers on vietnamese students’ feedback corpus. In Proceedings of the 5th NAFOSTED Conference on Information and Computer Science (NICS). IEEE, 75–80. Google Scholar
Cross Ref
- [51] . 2020. A two-channel model for representation learning in vietnamese sentiment classification problem. J. Comput. Sci. Cybernet. 36, 4 (2020), 305–323. Google Scholar
Cross Ref
- [52] . 2020. Fine-tuning BERT for sentiment analysis of vietnamese reviews. In Proceedings of the 7th NAFOSTED Conference on Information and Computer Science. IEEE, 302–307. Google Scholar
Cross Ref
- [53] . 2018. Variants of long short-term memory for sentiment analysis on vietnamese students’ feedback corpus. In Proceedings of the 10th International Conference on Knowledge and Systems Engineering. IEEE, 306–311.Google Scholar
Cross Ref
- [54] . 2018. Variants of long short-term memory for sentiment analysis on vietnamese students’ feedback corpus. In Proceedings of the 10th International Conference on Knowledge and Systems Engineering. IEEE, 306–311. Google Scholar
Cross Ref
- [55] . 2019. One-document training for vietnamese sentiment analysis. In Computational Data and Social Networks, and (Eds.). Springer International Publishing, Cham, 189–200.Google Scholar
- [56] . 2019. Vietnamese sentiment analysis for hotel review based on overfitting training and ensemble learning. In Proceedings of the 10th International Symposium on Information and Communication Technology. Association for Computing Machinery, 147–153. Google Scholar
Digital Library
- [57] . 2019. A vietnamese sentiment analysis system based on multiple classifiers with enhancing lexicon features. In Industrial Networks and Intelligent Systems, , , , , and (Eds.). Springer International Publishing, Cham, 240–249.Google Scholar
- [58] . 2021. A survey of sentiment analysis in the portuguese language. Artif. Intell. Rev. 54, 2 (2021), 1087–1115.Google Scholar
Digital Library
- [59] . 2022. ViT5: Pretrained text-to-text transformer for vietnamese language generation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop. Association for Computational Linguistics, 136–142. Google Scholar
Cross Ref
- [60] . 2018. A vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics. Artif. Intell. Rev. 50, 1 (2018), 93–159.Google Scholar
Digital Library
- [61] . 2019. A valence-totaling model for vietnamese sentiment classification. Evolv. Syst. 10, 3 (2019), 453–499.Google Scholar
Cross Ref
- [62] . 2023. Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affect. Comput. 14, 1 (2023), 108–132. Google Scholar
Digital Library
- [63] . 2021. An effective BERT-based pipeline for twitter sentiment analysis: A case study in italian. Sensors 21, 1 (2021), 133.Google Scholar
Cross Ref
- [64] . 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res. 21, 140 (2020), 1–67.Google Scholar
- [65] . 2021. An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews. Appl. Soft Comput. 98 (2021), 106935.Google Scholar
Cross Ref
- [66] . 2021. How good is your tokenizer? On the monolingual performance of multilingual language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 3118–3135. Google Scholar
Cross Ref
- [67] . 2021. Sentiment analysis on the impact of coronavirus in social life using the BERT model. Soc. Netw. Anal. Min. 11, 1 (2021), 1–11. Google Scholar
Cross Ref
- [68] . 2019. How to fine-tune BERT for text classification? In Chinese Computational Linguistics, , , , , and (Eds.). Springer International Publishing, Cham, 194–206.Google Scholar
Digital Library
- [69] . 2015. Constructing sentiment ontology for vietnamese reviews. In Proceedings of the 17th International Conference on Information Integration and Web-Based Applications and Services (iiWAS’15). Association for Computing Machinery, New York, NY, Article
36 , 5 pages. Google ScholarDigital Library
- [70] . 2016. Computing sentiment scores of adjective phrases for vietnamese. In Multi-disciplinary Trends in Artificial Intelligence, , , , and (Eds.). Springer International Publishing, Cham, 288–296.Google Scholar
Cross Ref
- [71] . 2016. Multi-class opinion classification for Vietnamese hotel reviews. Int. J. Intell. Technol. Appl. Stat. 9, 1 (2016), 7–18.Google Scholar
- [72] . 2018. A hybrid approach for building a Vietnamese sentiment dictionary. J. Intell. Fuzzy Syst. 35, 1 (2018), 967–978.Google Scholar
Digital Library
- [73] . 2018. Towards a sentiment analysis model based on semantic relation analysis. Int. J. Synth. Emot. 9, 2 (2018), 54–75.Google Scholar
Digital Library
- [74] . 2020. Capturing contextual factors in sentiment classification: An ensemble approach. IEEE Access 8 (2020), 116856–116865.Google Scholar
Cross Ref
- [75] . 2018. Combining Lexicon-Based and Learning-Based Methods for Sentiment Analysis for Product Reviews in Vietnamese Language. Springer International Publishing, Cham, 57–75. Google Scholar
Cross Ref
- [76] . 2016. Lexicon-Based Sentiment Analysis of Facebook Comments in Vietnamese Language. Springer International Publishing, Cham, 263–276. Google Scholar
Cross Ref
- [77] . 2020. Sentiment analysis implementing BERT-based pre-trained language model for vietnamese. In Proceedings of the 7th NAFOSTED Conference on Information and Computer Science. IEEE, 362–367. Google Scholar
Cross Ref
- [78] . 2018. VietSentiLex: A sentiment dictionary that considers the polarity of ambiguous sentiment words. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation. Association for Computational Linguistics.Google Scholar
- [79] . 2016. Topic classification and sentiment analysis for vietnamese education survey system. As. J. Comput. Sci. Inf. Technol. 6, 3 (2016), 27–34.Google Scholar
- [80] . 2017. Multi-channel LSTM-CNN model for vietnamese sentiment analysis. In Proceedings of the 9th International Conference on Knowledge and Systems Engineering. IEEE, 24–29. Google Scholar
Cross Ref
- [81] . 2017. An efficient hybrid model for vietnamese sentiment analysis. In Intelligent Information and Database Systems, , , , and (Eds.). Springer International Publishing, Cham, 227–237.Google Scholar
Cross Ref
- [82] . 2018. VnCoreNLP: A vietnamese natural language processing toolkit. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 56–60. Google Scholar
Cross Ref
- [83] . 2014. Construction of vietnamese sentiwordnet by using Vietnamese dictionary. In Proceedings of the Korea Information Processing Society Conference. Korea Information Processing Society, 745–748.Google Scholar
- [84] . 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Hong Kong, China, 6382–6388. Google Scholar
Cross Ref
- [85] . 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. Google Scholar
Cross Ref
- [86] . 2021. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 483–498. Google Scholar
Cross Ref
- [87] . 2020. Sentiment analysis using deep learning architectures: A review. Artificial Intelligence Review 53, 6 (2020), 4335–4385.Google Scholar
Digital Library
- [88] . 2022. Prompt tuning for discriminative pre-trained language models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’22). Association for Computational Linguistics, Dublin, Ireland, 3468–3473. Google Scholar
Cross Ref
- [89] . 2018. Deep learning for sentiment analysis: A survey. Data Min. Knowl. Discov. 8, 4 (2018), e1253.Google Scholar
Index Terms
Vietnamese Sentiment Analysis: An Overview and Comparative Study of Fine-tuning Pretrained Language Models
Recommendations
A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics
Emotion classification is used in many commercial applications and research applications. The semantic classification models (or sentiment classification methods) are based on the vocabulary of the emotion dictionary being studied and being used very ...
English- Vietnamese Cross-Language Paraphrase Identification Method
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyParaphrase identification is a very important problem and is used in many natural language processing tasks such as machine translation, bilingual information retrieval, plagiarism detection, etc. With the development of information technology and the ...
Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular PapersA novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, ...






Comments