Abstract
A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, until tuning sentiment value with tagged sentiment corpus. It begins by taking seed words from WordNet Bahasa that mapped with sentiment value from English SentiWordNet. The seed words are enriched by combining the dictionary-based method with words’ synonyms and antonyms, and corpus-based methods with word embedding for word similarity that trained in positive and negative sentiment corpus from online marketplaces review and Twitter data. The valence score of each lexicon is recalculated based on its relative occurrence in the corpus. We also add some famous slang words and emoticons to enrich the lexicon. Our experiment shows that the proposed method can provide an increase of 3.5 times lexicon number as well as improve the accuracy of 80.9% for online review and 95.7% for Twitter data, and they are better than other published and available Indonesian sentiment lexicons.
- Silvio Amir, Ramón Astudillo, Wang Ling, Bruno Martins, Mario J. Silva, and Isabel Trancoso. 2015. INESC-ID: A regression model for large scale Twitter sentiment lexicon induction. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 613--618. DOI:https://doi.org/10.18653/v1/s15-2102Google Scholar
Cross Ref
- Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SENTIWORDNET 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of LREC 10, (Nov. 2010), 2200--2204. DOI:https://doi.org/10.1.1.61.7217Google Scholar
- Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 165--173. DOI:https://doi.org/10.3115/v1/w14-3623Google Scholar
Cross Ref
- Francis Bond, Lian Tze Lim, Enya Kong Tang, and Hammam Riza. 2014. The combined Wordnet Bahasa. NUSA: Linguistic Studies of Languages in and Around Indonesia 57, Chapter 8 (2014), 83--100. DOI:https://doi.org/10.1007/s11023-007-9060-8.COPYRIGHT arxiv:arXiv:1310.1707v3Google Scholar
- Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference (2009), 31--40.Google Scholar
- Erik Cambria, Robert Speer, Catherine Havasi, and Amir Hussain. 2010. SenticNet: A publicly available semantic resource for opinion mining. AAAI Fall Symposium - Technical Report FS-10-02 (2010), 14--18.Google Scholar
- Ilia Chetviorkin and Natalia Loukachevitch. 2015. Two-step model for sentiment lexicon extraction from Twitter streams. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 67--72. DOI:https://doi.org/10.3115/v1/w14-2612Google Scholar
- Mohammad Darwich, Shahrul Azman Mohd Noah, and Nazlia Omar. 2016. Automatically generating a sentiment lexicon for the Malay language. Asia-Pacific Journal of Information Technology and Multimedia 5, 1 (2016), 49--59.Google Scholar
Cross Ref
- Rahim Dehkharghani. 2019. SentiFars: A Persian polarity lexicon for sentiment analysis. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 2 (2019). DOI:https://doi.org/10.1145/3345627Google Scholar
- Mohammad Ehsan Basiri and Arman Kabiri. 2018. Words are important: Improving sentiment analysis in the Persian language by lexicon refining. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018). DOI:https://doi.org/10.1145/3195633Google Scholar
Digital Library
- Franky, Ondřej Bojar, and Kateřina Veselovská. 2015. Resources for Indonesian sentiment analysis. The Prague Bulletin of Mathematical Linguistics 103, 1 (2015), 21--41. DOI:https://doi.org/10.1515/pralin-2015-0002Google Scholar
Cross Ref
- Lichan Hong, Gregorio Convertino, and Ed H. Chi. 2011. Language matters in Twitter: A large scale study. AAAI (2011), 518--521.Google Scholar
- CJ J. Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM’14) , 216--225.Google Scholar
- Svetlana Kiritchenko, Xiaodan Zhu, and Saif M. Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723--762.Google Scholar
Cross Ref
- O. Yu Koltsova, S. V. Alexeeva, and S. N. Kolcov. 2016. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. Proceedings of the International Conference “Dialogue 2016” (2016), 277--287.Google Scholar
- Fajri Koto and Gemala Y. Rahmaningtyas. 2018. Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs. In Proceedings of the 2017 International Conference on Asian Language Processing (IALP’17) (2018), 391--394. DOI:https://doi.org/10.1109/IALP.2017.8300625Google Scholar
Cross Ref
- Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers. DOI:https://doi.org/10.1162/COLI arxiv:1003.5699Google Scholar
Digital Library
- Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web, 342--351.Google Scholar
Digital Library
- Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4 (2014), 1093--1113. DOI:https://doi.org/10.1016/j.asej.2014.04.011Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Workshop at ICLR. 1--12. arxiv:1301.3781Google Scholar
- Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. 2016. Sentiment lexicons for arabic social media. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16) , 33--37. DOI:https://doi.org/10.1371/journal.pone.0018275Google Scholar
- Saif M. Mohammad and Peter D. Turney. 2012. Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29, 3 (2012).Google Scholar
- Finn Årup Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In CEUR Workshop Proceedings, Vol. 718. 93--98. arxiv:1103.2903Google Scholar
- Veronica Perez-Rosas, Carmen Banea, and Rada Mihalcea. 2012. Learning sentiment lexicons in spanish. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), 3077--3081. DOI:https://doi.org/10.1.1.383.5959Google Scholar
- Jacobo Rouces, Nina Tahmasebi, Lars Borin, and Stian Rødven Eide. 2019. Sensaldo: Creating a sentiment lexicon for Swedish. In Proceedings of LREC 2018-11th International Conference on Language Resources and Evaluation, 4192--4198.Google Scholar
- Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2 (2011), 267--307. DOI:https://doi.org/10.1162/COLI_a_00049Google Scholar
Digital Library
- Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2013. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology 64, (July 2013), 1852--1863. DOI:https://doi.org/10.1002/asiGoogle Scholar
- Clara Vania, Moh. Ibrahim, and Mirna Adriani. 2014. Sentiment lexicon generation for an under-resourced language. International Journal of Computational Linguistics and Applications 5, 1 (2014).Google Scholar
- Xuan-Son Vu and Seong-Bae Park. 2014. Construction of Vietnamese sentiwordnet by using Vietnamese dictionary. arXiv:1412.8010 (2014), 2--5.Google Scholar
- Devid Haryalesmana Wahid and Azhari S. N. 2016. Peringkasan sentimen esktraktif di twitter menggunakan hybrid TF-IDF dan cosine similarity. Indonesian Journal of Computing and Cybernetics Systems 10, 2 (2016), 207--218.Google Scholar
Cross Ref
- Ulli Waltinger. 2010. GermanPolarityClues: A lexical resource for German sentiment analysis. In Proceedings of the 7th International Conference on Language Resources and Evaluation, 1638--1642.Google Scholar
- Xiaojun Wan. 2010. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of Tthe Conference on Empirical Methods in Natural Language Processing. 553. DOI:https://doi.org/10.3115/1613715.1613783Google Scholar
- Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2010. Recognizing contextual polarity in phrase-level sentiment analysis. International Journal of Computer Applications 7, 5 (2010), 12--21. DOI:https://doi.org/10.5120/1160-1453Google Scholar
Cross Ref
- Liang-Chih Yu, Lung-Hao Lee, Shuai Hao, Jin Wang, Yunchao He, Jun Hu, K. Robert Lai, and Xuejie Zhang. 2016. Building Chinese affective resources in valence-arousal dimensions. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 540--545. DOI:https://doi.org/10.18653/v1/n16-1066Google Scholar
Cross Ref
Index Terms
Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis
Recommendations
An automatic non-English sentiment lexicon builder using unannotated corpus
Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages ...
A Sentiment Analysis Algorithm of Danmaku Based on Building a Mixed Fine-grained Sentiment Lexicon
ICCPR '20: Proceedings of the 2020 9th International Conference on Computing and Pattern RecognitionThe Danmaku is a form of instant video text commentary that reflects the viewer's sentiment orientation. Currently, most of sentiment analysis algorithms based on the sentiment lexicon are using manual construction of the lexicon. However, this kind of ...
Lexicon-based sentiment analysis
This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering MPQA Subjectivity Lexicon, General Inquirer, ...






Comments