skip to main content
research-article

Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level

Authors Info & Claims
Published:26 May 2021Publication History
Skip Abstract Section

Abstract

Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site.1

References

  1. Plaban Kr. Bhowmick, Pabitra Mitra, and Anupam Basu. 2008. An agreement measure for determining inter-annotator reliability of human judgements on affective text. In Proceedings of the Workshop on Human Judgements in Computational Linguistics. Association for Computational Linguistics. 58–65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Xiao Chen, Changlong Sun, Jingjing Wang, Shoushan Li, Luo Si, Min Zhang, and Guodong Zhou. 2020. Aspect sentiment classification with document-level sentiment preference modeling. In Proceedings of the 58th Meeting of the Association for Computational Linguistics. 3667–3677.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yoon Mi Oh François Pellegrino Egidio and Marsico Christophe Coupé. 2013. A quantitative and typological approach to correlating linguistic complexity. QITL-5 (2013), 71.Google ScholarGoogle Scholar
  4. Erfan Ghadery, Sajad Movahedi, Heshaam Faili, and Azadeh Shakery. 2019. MNCN: A multilingual Ngram-based convolutional network for aspect category detection in online reviews. In Proceedings of the AAAI Conference on Artificial Intelligence. 6441–6448.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2018. Exploiting document knowledge for aspect-level sentiment classification. In Proceedings of the 56th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 579–585. DOI:https://doi.org/10.18653/v1/P18-2092Google ScholarGoogle ScholarCross RefCross Ref
  6. Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2019. An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 504–515.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mickel Hoang, Oskar Alija Bihorac, and Jacobo Rouces. 2019. Aspect-based sentiment analysis using bert. In NEAL Proceedings of the 22nd Nordic Conference on Computional Linguistics (NoDaLiDa’19). Linköping University Electronic Press, Association for Computational Linguistics, Finland, 187–196.Google ScholarGoogle Scholar
  8. Robert Ireland and Ang Liu. 2018. Application of data analytics for product design: Sentiment analysis of online product reviews. CIRP J. Manuf. Sci. Technol. 23 (2018), 128–144.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jian Jin, Ying Liu, Ping Ji, and Hongguang Liu. 2016. Understanding big consumer opinion data for market-driven product design. Int. J. Prod. Res. 54, 10 (2016), 3019–3041.Google ScholarGoogle ScholarCross RefCross Ref
  10. H. S. Le, T. V. Le, and T. V. Pham. 2015. Aspect analysis for opinion mining of Vietnamese text. In Proceedings of the International Conference on Advanced Computing and Applications (ACOMP’15). IEEE, 118–123. DOI:https://doi.org/10.1109/ACOMP.2015.21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Junjie Li, Haitong Yang, and Chengqing Zong. 2018. Document-level multi-aspect sentiment classification by jointly modeling users, aspects, and overall ratings. In Proceedings of the 27th International Conference on Computational Linguistics. 925–936.Google ScholarGoogle Scholar
  12. Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam. 2019. Exploiting BERT for end-to-end aspect-based sentiment analysis. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT’19). 34–41.Google ScholarGoogle ScholarCross RefCross Ref
  13. Xin Li and Wai Lam. 2017. Deep multi-task learning for aspect term extraction with memory interaction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2886–2892. DOI:https://doi.org/10.18653/v1/D17-1310Google ScholarGoogle ScholarCross RefCross Ref
  14. Bing Liu and Lei Zhang. 2012. A Survey of Opinion Mining and Sentiment Analysis. Springer US, Boston, MA, 415–463. DOI:https://doi.org/10.1007/978-1-4614-3223-4_13Google ScholarGoogle Scholar
  15. Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the Variance of the Adaptive Learning Rate and Beyond. arxiv:cs.LG/1908.03265 (2019).Google ScholarGoogle Scholar
  16. Long Mai and Bac Le. 2018. Aspect-based sentiment analysis of Vietnamese texts with deep learning. In Intelligent Information and Database Systems, Ngoc Thanh Nguyen, Duong Hung Hoang, Tzung-Pei Hong, Hoang Pham, and Bogdan Trawiński (Eds.). Springer International Publishing, Cham, 149–158.Google ScholarGoogle Scholar
  17. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sajad Movahedi, Erfan Ghadery, Heshaam Faili, and Azadeh Shakery. 2019. Aspect category detection via topic-attention network. CoRR. http://arxiv.org/abs/1901.01183 (2019).Google ScholarGoogle Scholar
  19. Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. Findings of EMNLP (2020).Google ScholarGoogle Scholar
  20. Huyen Nguyen, Hung Nguyen, Quyen Ngo, Luong Vu, Vu Tran, Bach Ngo, and Cuong Le. 2019. VLSP shared task: Sentiment analysis. J. Comput. Sci. Cyber. 34, 4 (2019), 295–310. Retrieved from: http://vjs.ac.vn/index.php/jcc/article/view/13160Google ScholarGoogle Scholar
  21. M. Nguyen, T. M. Nguyen, D. Van Thin, and N. L. Nguyen. 2019. A corpus for aspect-based sentiment analysis in Vietnamese. In Proceedings of the 11th International Conference on Knowledge and Systems Engineering (KSE’19). 1–5.Google ScholarGoogle Scholar
  22. T. P. Nguyen and A. C. Le. 2016. A hybrid approach to Vietnamese word segmentation. In Proceedings of the IEEE RIVF International Conference on Computing Communication Technologies, Research, Innovation, and Vision for the Future (RIVF’16). IEEE, 114–119. DOI:https://doi.org/10.1109/RIVF.2016.7800279Google ScholarGoogle Scholar
  23. Nguyen Minh Nhut. 2020. An analysis of grammatical errors by Vietnamese learners of English. Int. J. Adv. Res. Educ. Soc. 2, 2 (2020), 23–34. Retrieved from: http://myjms.moe.gov.my/index.php/ijares/article/view/9652Google ScholarGoogle Scholar
  24. Thai-Hoang Pham and Phuong Le-Hong. 2017. End-to-end recurrent neural network models for Vietnamese named entity recognition: Word-level vs. character-level. CoRR abs/1705.04044 (2017).Google ScholarGoogle Scholar
  25. Ben Phạm and Sharynne McLeod. 2016. Consonants, vowels and tones across Vietnamese dialects. Int. J. Speech-lang. Pathol. 18, 2 (2016), 122–134. DOI:https://doi.org/10.3109/17549507.2015.1101162Google ScholarGoogle ScholarCross RefCross Ref
  26. Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, Véronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy Kotelnikov, Nuria Bel, Salud María Jiménez-Zafra, and Gülşen Eryiğit. 2016. SemEval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). Association for Computational Linguistics, 19–30. DOI:https://doi.org/10.18653/v1/S16-1002Google ScholarGoogle ScholarCross RefCross Ref
  27. Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. SemEval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). Association for Computational Linguistics, 486–495. DOI:https://doi.org/10.18653/v1/S15-2082Google ScholarGoogle ScholarCross RefCross Ref
  28. Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. ELRA, 45–50. Retrieved from http://is.muni.cz/publication/884893/enGoogle ScholarGoogle Scholar
  29. Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, and Sebastian Riedel. 2016. SentiHood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 1546–1556. Retrieved from https://www.aclweb.org/anthology/C16-1146Google ScholarGoogle Scholar
  30. Martin Schmitt, Simon Steinheber, Konrad Schreiber, and Benjamin Roth. 2018. Joint aspect and polarity classification for aspect-based sentiment analysis with end-to-end neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1109–1114. Retrieved from https://www.aclweb.org/anthology/D18-1139Google ScholarGoogle ScholarCross RefCross Ref
  31. Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2011. On the stratification of multi-label data. In Machine Learning and Knowledge Discovery in Databases, Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis (Eds.). Springer Berlin, 145–158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chi Sun, Luyao Huang, and Xipeng Qiu. 2019. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 380–385.Google ScholarGoogle Scholar
  33. Giang Tang. 2007. Cross-linguistic analysis of Vietnamese and English with implications for Vietnamese language acquisition and maintenance in the United States. J. Southeast Asian Amer. Educ. Advanc. 2, 1 (2007), 3.Google ScholarGoogle Scholar
  34. D. V. Thin, V. D. Nguye, K. V. Nguyen, and N. L. Nguyen. 2018. Deep learning for aspect detection on Vietnamese reviews. In Proceedings of the 5th NAFOSTED Conference on Information and Computer Science (NICS’18). IEEE, 104–109.Google ScholarGoogle Scholar
  35. Dang Van Thin, Vu Nguyen, Nguyen Kiet, and Nguyen Ngan. 2019. A transformation method for aspect-based sentiment analysis. J. Comput. Sci. Cyber. 34, 4 (2019), 323–333. DOI:https://doi.org/10.15625/1813-9663/34/4/13162Google ScholarGoogle Scholar
  36. N. T. T. Thuy, N. X. Bach, and T. M. Phuong. 2018. Cross-language aspect extraction for opinion mining. In Proceedings of the 10th International Conference on Knowledge and Systems Engineering (KSE’18). IEEE, 67–72.Google ScholarGoogle Scholar
  37. Khai Tran and Thi Phan. 2019. Deep learning application to ensemble learning—The simple, but effective, approach to sentiment classifying. Appl. Sci. 9, 13 (July 2019), 2760. DOI:https://doi.org/10.3390/app9132760Google ScholarGoogle Scholar
  38. Phuoc Tran, Dien Dinh, and Hien T. Nguyen. 2016. A character level based and word level based approach for Chinese-Vietnamese machine translation. Computational intelligence and Neuroscience 2016 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hai Wan, Yufei Yang, Jianfeng Du, Yanan Liu, Kunxun Qi, and Jeff Z. Pan. 2020. Target-aspect-sentiment joint detection for aspect-based sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence. 9122–9129.Google ScholarGoogle Scholar
  40. Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang, Min Zhang, Luo Si, and Guodong Zhou. 2018. Aspect sentiment classification with both word-level and clause-level attention networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 4439–4445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jingjing Wang, Changlong Sun, Shoushan Li, Jiancheng Wang, Luo Si, Min Zhang, Xiaozhong Liu, and Guodong Zhou. 2019. Human-like decision making: Document-level aspect sentiment classification via hierarchical reinforcement learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5585–5594.Google ScholarGoogle ScholarCross RefCross Ref
  42. Wenya Wang, Sinno Jialin Pan, and Daniel Dahlmeier. 2017. Multi-task coupled attentions for category-specific aspect and opinion terms co-extraction. CoRR abs/1702.01776 (2017).Google ScholarGoogle Scholar
  43. Michael Wojatzki, Eugen Ruppert, Sarah Holschneider, Torsten Zesch, and Chris Biemann. 2017. Germeval 2017: Shared task on aspect-based sentiment in social media customer feedback. Proceedings of the GermEval (2017), 1--12.Google ScholarGoogle Scholar
  44. Wei Xue and Tao Li. 2018. Aspect based sentiment analysis with gated convolutional networks. In Proceedings of the 56th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2514–2523.Google ScholarGoogle ScholarCross RefCross Ref
  45. Wei Xue, Wubai Zhou, Tao Li, and Qing Wang. 2017. MTNA: A neural multi-task model for aspect category classification and aspect term extraction on restaurant reviews. In Proceedings of the 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 151–156.Google ScholarGoogle Scholar
  46. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1480–1489. DOI:https://doi.org/10.18653/v1/N16-1174Google ScholarGoogle ScholarCross RefCross Ref
  47. Yichun Yin, Yangqiu Song, and Ming Zhang. 2017. Document-level multi-aspect sentiment classification as machine comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2044–2054. DOI:https://doi.org/10.18653/v1/D17-1217Google ScholarGoogle ScholarCross RefCross Ref
  48. J. Yu, J. Jiang, and R. Xia. 2019. Global inference for aspect and opinion terms co-extraction based on multi-task neural networks. IEEE/ACM Trans. Aud., Speech, Lang. Proc. 27, 1 (2019), 168–177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hai Zhao, Tianjiao Yin, and Jingyi Zhang. 2013. Vietnamese to Chinese machine translation via Chinese character as pivot. In Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC’13). 250–259.Google ScholarGoogle Scholar
  50. Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis C. M. Lau. 2015. A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015).Google ScholarGoogle Scholar
  51. Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2015. Representation learning for aspect category detection in online reviews. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!