Abstract
The sentiment lexicon is an important tool for natural language processing tasks. In addition to being able to determine the sentiment polarity of words or phrases, it can assist attribute-level, sentence-level, and text-level sentiment analysis tasks. In light of the fact that tagging data and corpora for the Khmer language are scarce, where most resources related to sentiment lexicons are for English, this paper proposes a method for constructing a sentiment lexicon for Khmer based on Positive-Unlabeled learning (PU Learning) and the label propagation algorithm. Sentiment words are first extracted from a corpus using the Spy technique of PU learning method. The main idea is to purify the set of N-class examples, train the MLP classifier, and continuously delete spy words and increase the number of P-class words in the iterative process. Following this, the sentiment polarity of the candidate words is determined. By considering the problem of determining the sentiment polarity of the candidate words as one of calculating its probability distribution, a small number of labeled sentiment words and candidate words are used to construct a graph model. The contextual information of the candidate words is used to construct a simple supplementary graph model of the set of sentiment words through word co-occurrence and triangulation, where this enhances the correlation between data items. The sentiment polarity of the candidate words is then determined through the label propagation algorithm. The results of experiments show that the proposed method can be used to construct a Khmer sentiment lexicon with a small number of labeled data and a small corpus without requiring excessive manual labeling.
- [1] . 2012. Sentiment Analysis and Opinion Mining. San Rafael, CA, Morgan & Claypool Publishers, (2012).Google Scholar
Cross Ref
- [2] . 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, ACM, (2004), 168−177.Google Scholar
Digital Library
- [3] . 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg. PA, USA, Association for Computational Linguistics, 174−181.Google Scholar
Digital Library
- [4] . 2002. Partially supervised classification of text documents. In Proceedings of the Nineteenth International Conference on Mach ine Learning (ICML'02).Google Scholar
Digital Library
- [5] . 2002. Learning from Labels and Unlabeled Data with Label Propagation [J]. Tech. Rep., Technical Report CMU-CALD-02.107, 2002.Google Scholar
- [6] . 2004. Using WordNet to measure semantic orientation of adjectives. In Proceedings of LREC.Google Scholar
- [7] 2011. Identifying the semantic orientation of foreign words[C]. Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers. Association for Computational Linguistics, (2011).Google Scholar
Digital Library
- [8] . 2007. PageRanking WordNet Synsets: An application to opinion mining. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Association for Computational Linguistics, 424−431.Google Scholar
- [9] 2010. Construction of a sentimental word dictionary[C]. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM'10). Toronto, Ontario, Canada, (October 26–30, 2010), ACM.Google Scholar
Digital Library
- [10] . 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[J]. arXiv preprint cs/0212032, (2002).Google Scholar
- [11] . 2006. Fully automatic lexicon expansion for domain-oriented sentiment analysis. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA, Association for Computational Linguistics, 355−363.Google Scholar
Digital Library
- [12] . 2013. Generating contextualized sentiment lexica based on latent topics and user ratings. In Proceedings of the 24th ACM Conference on Hypertext and Social Media. New York, NY, ACM 129−138.Google Scholar
Digital Library
- [13] . 2014. Learning sentiment-specific word embedding for Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA, Association for Computational Linguistics, 1555−1565.Google Scholar
Cross Ref
- [14] 2016. Inducing domain-specific sentiment lexicons from unlabeled corpora[C]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing. NIH Public Access (2016), 595.Google Scholar
Cross Ref
- [15] . 2015. Positive, negative, or neutral: Learning an expanded opinion lexicon from emoticon-annotated tweets[C]. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI'15). AAAI Press.Google Scholar
- [16] . 2016. Don't count, predict! An automatic approach to learning sentiment lexicons for short text[C]. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2 (2016), 219−224.Google Scholar
Cross Ref
- [17] 2019. Sparse self-attention LSTM for sentiment lexicon construction[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 11 (2019), 1777−1790.Google Scholar
Digital Library
- [18] 2016. Towards building a high-quality microblog-specific Chinese sentiment lexicon[J]. Decision Support Systems 87 (2016), 39−49.Google Scholar
Digital Library
- [19] 2017. Feel: A French expanded emotion lexicon[J]. Language Resources and Evaluation 51, 3 (2017), 833−855.Google Scholar
Digital Library
- [20] 2019. Expansion of sentiment lexicon based on label propagation[C]. 2019 15th International Conference on Semantics, Knowledge and Grids (SKG). IEEE, 145–152.Google Scholar
Cross Ref
- [21] . 2003. Learning to classify texts using positive and unlabeled data. In Proceedings of the Eighteenth International Joint Conference on Artifical Intelligence 3 (2003), 587–592Google Scholar
- [22] . 2015. Pu learning for matrix completion. In ICML. 2445–2453.Google Scholar
- [23] . 2011. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Second Edition. Springer.Google Scholar
Cross Ref
- [24] . 2019. Suggestion sentence classification method based on PU learning. Journal of Computer Applications 39, 3 (2019), 639–643.Google Scholar
- [25] . 2018. Non-linear Attributed Graph Clustering by Symmetric NMF with PU Learning[J]. (2018).Google Scholar
- [26] . 2018. Learning word embeddings for low-resource languages by PU learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers). Association for Computational Linguistics, 1024–1034.Google Scholar
Cross Ref
- [27] 2017. Positive-Unlabeled Learning with Non-Negative Risk Estimator[J]. (2017).Google Scholar
- [28] . 2017. Sentiment lexicon expansion based on neural PU learning, double dictionary lookup, and polarity association[C]. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Google Scholar
Cross Ref
- [29] 2011. Sentiment classification in resource-scarce languages by using label propagation[C]. In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC'25). Singapore, (Dec. 16–18, 2011), 420–429.Google Scholar
- [30] . 2014. Automatic construction of domain-specific sentiment lexicon based on constrained label propagation[J]. Knowledge Based Systems 56 (2014), 191–200.Google Scholar
Cross Ref
- [31] . 2019. Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons[J]. Computer Speech & Language (2019).Google Scholar
Cross Ref
- [32] 2015. Emotional polarity recognition of new words based on label propagation algorithm[J]. Journal of Frontiers of Computer Science & Technology 9, 12 (2015), 1506–1512.Google Scholar
- [33] 2016. A Khmer word segmentation and part-of-speech tagging method based on cascaded conditional random fields[J]. Journal of Chinese Information Processing 30, 4 (2016), 110–116.Google Scholar
Index Terms
Khmer Sentiment Lexicon Based on PU Learning and Label Propagation Algorithm
Recommendations
Cross-lingual sentiment lexicon learning with bilingual word graph label propagation
In this article we address the task of cross-lingual sentiment lexicon learning, which aims to automatically generate sentiment lexicons for the target languages with available English sentiment lexicons. We formalize the task as a learning problem on a ...
Automatic Domain-Specific Sentiment Lexicon Generation with Label Propagation
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesNowadays, the advance of social media has led to the explosive growth of opinion data. Therefore, sentiment analysis has attracted a lot of attentions. Currently, sentiment analysis applications are divided into two main approaches, the lexicon-based ...
Learning domain-specific sentiment lexicon with supervised sentiment-aware LDA
ECAI'14: Proceedings of the Twenty-first European Conference on Artificial IntelligenceAnalyzing and understanding people's sentiments towards different topics has become an interesting task due to the explosion of opinion-rich resources. In most sentiment analysis applications, sentiment lexicons play a crucial role, to be used as ...






Comments