skip to main content
research-article

Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis

Authors Info & Claims
Published:02 March 2021Publication History
Skip Abstract Section

Abstract

A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, until tuning sentiment value with tagged sentiment corpus. It begins by taking seed words from WordNet Bahasa that mapped with sentiment value from English SentiWordNet. The seed words are enriched by combining the dictionary-based method with words’ synonyms and antonyms, and corpus-based methods with word embedding for word similarity that trained in positive and negative sentiment corpus from online marketplaces review and Twitter data. The valence score of each lexicon is recalculated based on its relative occurrence in the corpus. We also add some famous slang words and emoticons to enrich the lexicon. Our experiment shows that the proposed method can provide an increase of 3.5 times lexicon number as well as improve the accuracy of 80.9% for online review and 95.7% for Twitter data, and they are better than other published and available Indonesian sentiment lexicons.

References

  1. Silvio Amir, Ramón Astudillo, Wang Ling, Bruno Martins, Mario J. Silva, and Isabel Trancoso. 2015. INESC-ID: A regression model for large scale Twitter sentiment lexicon induction. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 613--618. DOI:https://doi.org/10.18653/v1/s15-2102Google ScholarGoogle ScholarCross RefCross Ref
  2. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SENTIWORDNET 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of LREC 10, (Nov. 2010), 2200--2204. DOI:https://doi.org/10.1.1.61.7217Google ScholarGoogle Scholar
  3. Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 165--173. DOI:https://doi.org/10.3115/v1/w14-3623Google ScholarGoogle ScholarCross RefCross Ref
  4. Francis Bond, Lian Tze Lim, Enya Kong Tang, and Hammam Riza. 2014. The combined Wordnet Bahasa. NUSA: Linguistic Studies of Languages in and Around Indonesia 57, Chapter 8 (2014), 83--100. DOI:https://doi.org/10.1007/s11023-007-9060-8.COPYRIGHT arxiv:arXiv:1310.1707v3Google ScholarGoogle Scholar
  5. Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference (2009), 31--40.Google ScholarGoogle Scholar
  6. Erik Cambria, Robert Speer, Catherine Havasi, and Amir Hussain. 2010. SenticNet: A publicly available semantic resource for opinion mining. AAAI Fall Symposium - Technical Report FS-10-02 (2010), 14--18.Google ScholarGoogle Scholar
  7. Ilia Chetviorkin and Natalia Loukachevitch. 2015. Two-step model for sentiment lexicon extraction from Twitter streams. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 67--72. DOI:https://doi.org/10.3115/v1/w14-2612Google ScholarGoogle Scholar
  8. Mohammad Darwich, Shahrul Azman Mohd Noah, and Nazlia Omar. 2016. Automatically generating a sentiment lexicon for the Malay language. Asia-Pacific Journal of Information Technology and Multimedia 5, 1 (2016), 49--59.Google ScholarGoogle ScholarCross RefCross Ref
  9. Rahim Dehkharghani. 2019. SentiFars: A Persian polarity lexicon for sentiment analysis. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 2 (2019). DOI:https://doi.org/10.1145/3345627Google ScholarGoogle Scholar
  10. Mohammad Ehsan Basiri and Arman Kabiri. 2018. Words are important: Improving sentiment analysis in the Persian language by lexicon refining. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018). DOI:https://doi.org/10.1145/3195633Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Franky, Ondřej Bojar, and Kateřina Veselovská. 2015. Resources for Indonesian sentiment analysis. The Prague Bulletin of Mathematical Linguistics 103, 1 (2015), 21--41. DOI:https://doi.org/10.1515/pralin-2015-0002Google ScholarGoogle ScholarCross RefCross Ref
  12. Lichan Hong, Gregorio Convertino, and Ed H. Chi. 2011. Language matters in Twitter: A large scale study. AAAI (2011), 518--521.Google ScholarGoogle Scholar
  13. CJ J. Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM’14) , 216--225.Google ScholarGoogle Scholar
  14. Svetlana Kiritchenko, Xiaodan Zhu, and Saif M. Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723--762.Google ScholarGoogle ScholarCross RefCross Ref
  15. O. Yu Koltsova, S. V. Alexeeva, and S. N. Kolcov. 2016. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. Proceedings of the International Conference “Dialogue 2016” (2016), 277--287.Google ScholarGoogle Scholar
  16. Fajri Koto and Gemala Y. Rahmaningtyas. 2018. Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs. In Proceedings of the 2017 International Conference on Asian Language Processing (IALP’17) (2018), 391--394. DOI:https://doi.org/10.1109/IALP.2017.8300625Google ScholarGoogle ScholarCross RefCross Ref
  17. Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers. DOI:https://doi.org/10.1162/COLI arxiv:1003.5699Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web, 342--351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4 (2014), 1093--1113. DOI:https://doi.org/10.1016/j.asej.2014.04.011Google ScholarGoogle ScholarCross RefCross Ref
  20. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Workshop at ICLR. 1--12. arxiv:1301.3781Google ScholarGoogle Scholar
  21. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. 2016. Sentiment lexicons for arabic social media. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16) , 33--37. DOI:https://doi.org/10.1371/journal.pone.0018275Google ScholarGoogle Scholar
  22. Saif M. Mohammad and Peter D. Turney. 2012. Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29, 3 (2012).Google ScholarGoogle Scholar
  23. Finn Årup Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In CEUR Workshop Proceedings, Vol. 718. 93--98. arxiv:1103.2903Google ScholarGoogle Scholar
  24. Veronica Perez-Rosas, Carmen Banea, and Rada Mihalcea. 2012. Learning sentiment lexicons in spanish. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), 3077--3081. DOI:https://doi.org/10.1.1.383.5959Google ScholarGoogle Scholar
  25. Jacobo Rouces, Nina Tahmasebi, Lars Borin, and Stian Rødven Eide. 2019. Sensaldo: Creating a sentiment lexicon for Swedish. In Proceedings of LREC 2018-11th International Conference on Language Resources and Evaluation, 4192--4198.Google ScholarGoogle Scholar
  26. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2 (2011), 267--307. DOI:https://doi.org/10.1162/COLI_a_00049Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2013. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology 64, (July 2013), 1852--1863. DOI:https://doi.org/10.1002/asiGoogle ScholarGoogle Scholar
  28. Clara Vania, Moh. Ibrahim, and Mirna Adriani. 2014. Sentiment lexicon generation for an under-resourced language. International Journal of Computational Linguistics and Applications 5, 1 (2014).Google ScholarGoogle Scholar
  29. Xuan-Son Vu and Seong-Bae Park. 2014. Construction of Vietnamese sentiwordnet by using Vietnamese dictionary. arXiv:1412.8010 (2014), 2--5.Google ScholarGoogle Scholar
  30. Devid Haryalesmana Wahid and Azhari S. N. 2016. Peringkasan sentimen esktraktif di twitter menggunakan hybrid TF-IDF dan cosine similarity. Indonesian Journal of Computing and Cybernetics Systems 10, 2 (2016), 207--218.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ulli Waltinger. 2010. GermanPolarityClues: A lexical resource for German sentiment analysis. In Proceedings of the 7th International Conference on Language Resources and Evaluation, 1638--1642.Google ScholarGoogle Scholar
  32. Xiaojun Wan. 2010. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of Tthe Conference on Empirical Methods in Natural Language Processing. 553. DOI:https://doi.org/10.3115/1613715.1613783Google ScholarGoogle Scholar
  33. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2010. Recognizing contextual polarity in phrase-level sentiment analysis. International Journal of Computer Applications 7, 5 (2010), 12--21. DOI:https://doi.org/10.5120/1160-1453Google ScholarGoogle ScholarCross RefCross Ref
  34. Liang-Chih Yu, Lung-Hao Lee, Shuai Hao, Jin Wang, Yunchao He, Jun Hu, K. Robert Lai, and Xuejie Zhang. 2016. Building Chinese affective resources in valence-arousal dimensions. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 540--545. DOI:https://doi.org/10.18653/v1/n16-1066Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Asian and Low-Resource Language Information Processing
              ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 1
              Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers
              January 2021
              332 pages
              ISSN:2375-4699
              EISSN:2375-4702
              DOI:10.1145/3439335
              Issue’s Table of Contents

              Copyright © 2021 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 2 March 2021
              • Accepted: 1 September 2020
              • Revised: 1 June 2020
              • Received: 1 July 2019
              Published in tallip Volume 20, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!