skip to main content
note

Khmer Sentiment Lexicon Based on PU Learning and Label Propagation Algorithm

Authors Info & Claims
Published:10 March 2023Publication History
Skip Abstract Section

Abstract

The sentiment lexicon is an important tool for natural language processing tasks. In addition to being able to determine the sentiment polarity of words or phrases, it can assist attribute-level, sentence-level, and text-level sentiment analysis tasks. In light of the fact that tagging data and corpora for the Khmer language are scarce, where most resources related to sentiment lexicons are for English, this paper proposes a method for constructing a sentiment lexicon for Khmer based on Positive-Unlabeled learning (PU Learning) and the label propagation algorithm. Sentiment words are first extracted from a corpus using the Spy technique of PU learning method. The main idea is to purify the set of N-class examples, train the MLP classifier, and continuously delete spy words and increase the number of P-class words in the iterative process. Following this, the sentiment polarity of the candidate words is determined. By considering the problem of determining the sentiment polarity of the candidate words as one of calculating its probability distribution, a small number of labeled sentiment words and candidate words are used to construct a graph model. The contextual information of the candidate words is used to construct a simple supplementary graph model of the set of sentiment words through word co-occurrence and triangulation, where this enhances the correlation between data items. The sentiment polarity of the candidate words is then determined through the label propagation algorithm. The results of experiments show that the proposed method can be used to construct a Khmer sentiment lexicon with a small number of labeled data and a small corpus without requiring excessive manual labeling.

REFERENCES

  1. [1] Liu B.. 2012. Sentiment Analysis and Opinion Mining. San Rafael, CA, Morgan & Claypool Publishers, (2012).Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Hu M. Q. and Liu B.. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, ACM, (2004), 168177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Hatzivassiloglou V. and McKeown K. R.. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg. PA, USA, Association for Computational Linguistics, 174181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Liu Bing, Lee Wee Sun, Yu Philip S., and Li Xiaoli. 2002. Partially supervised classification of text documents. In Proceedings of the Nineteenth International Conference on Mach ine Learning (ICML'02).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Zhu X. and Ghahramani Z.. 2002. Learning from Labels and Unlabeled Data with Label Propagation [J]. Tech. Rep., Technical Report CMU-CALD-02.107, 2002.Google ScholarGoogle Scholar
  6. [6] Kamps J., Marx M., Mokken R. J., and de Rijke M.. 2004. Using WordNet to measure semantic orientation of adjectives. In Proceedings of LREC.Google ScholarGoogle Scholar
  7. [7] Hassan A., Abu-Jbara A., Jha R., et al. 2011. Identifying the semantic orientation of foreign words[C]. Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers. Association for Computational Linguistics, (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Esuli A. and Sebastiani F.. 2007. PageRanking WordNet Synsets: An application to opinion mining. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Association for Computational Linguistics, 424431.Google ScholarGoogle Scholar
  9. [9] Dragut E. C., Yu C., Sistla P., et al. 2010. Construction of a sentimental word dictionary[C]. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM'10). Toronto, Ontario, Canada, (October 26–30, 2010), ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Turney P. D.. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[J]. arXiv preprint cs/0212032, (2002).Google ScholarGoogle Scholar
  11. [11] Kanayama H. and Nasukawa T.. 2006. Fully automatic lexicon expansion for domain-oriented sentiment analysis. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA, Association for Computational Linguistics, 355363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Krestel R. and Siersdorfer S.. 2013. Generating contextualized sentiment lexica based on latent topics and user ratings. In Proceedings of the 24th ACM Conference on Hypertext and Social Media. New York, NY, ACM 129138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Tang D. Y., Wei F. R., Yang N., Zhou M., Liu T., and Qin B.. 2014. Learning sentiment-specific word embedding for Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA, Association for Computational Linguistics, 15551565.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hamilton W. L., Clark K., Leskovec J., et al. 2016. Inducing domain-specific sentiment lexicons from unlabeled corpora[C]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing. NIH Public Access (2016), 595.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Bravo-Marquez F., Frank E., and Pfahringer B.. 2015. Positive, negative, or neutral: Learning an expanded opinion lexicon from emoticon-annotated tweets[C]. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI'15). AAAI Press.Google ScholarGoogle Scholar
  16. [16] Vo D. T. and Zhang Y.. 2016. Don't count, predict! An automatic approach to learning sentiment lexicons for short text[C]. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2 (2016), 219224.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Deng D., Jing L., Yu J., et al. 2019. Sparse self-attention LSTM for sentiment lexicon construction[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 11 (2019), 17771790.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Wu F., Huang Y., Song Y., et al. 2016. Towards building a high-quality microblog-specific Chinese sentiment lexicon[J]. Decision Support Systems 87 (2016), 3949.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Abdaoui A., Azé J., Bringay S., et al. 2017. Feel: A French expanded emotion lexicon[J]. Language Resources and Evaluation 51, 3 (2017), 833855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Liu H. T., Zhu J. C., Liu X. Y., et al. 2019. Expansion of sentiment lexicon based on label propagation[C]. 2019 15th International Conference on Semantics, Knowledge and Grids (SKG). IEEE, 145152.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Li X. and Liu B.. 2003. Learning to classify texts using positive and unlabeled data. In Proceedings of the Eighteenth International Joint Conference on Artifical Intelligence 3 (2003), 587592Google ScholarGoogle Scholar
  22. [22] Hsieh Cho-Jui, Natarajan Nagarajan, and Dhillon Inderjit S.. 2015. Pu learning for matrix completion. In ICML. 24452453.Google ScholarGoogle Scholar
  23. [23] Liu Bing. 2011. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Second Edition. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Pu Zhang, Chang Liu, and Xiao Li. 2019. Suggestion sentence classification method based on PU learning. Journal of Computer Applications 39, 3 (2019), 639643.Google ScholarGoogle Scholar
  25. [25] Maekawa S., Takeuch K., and Onizuka M.. 2018. Non-linear Attributed Graph Clustering by Symmetric NMF with PU Learning[J]. (2018).Google ScholarGoogle Scholar
  26. [26] Jiang Chao, Yu Hsiang-Fu, Hsieh Cho-Jui, and Chang Kai-Wei. 2018. Learning word embeddings for low-resource languages by PU learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers). Association for Computational Linguistics, 10241034.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Kiryo R., Niu G., Plessis M. C. D., et al. 2017. Positive-Unlabeled Learning with Non-Negative Risk Estimator[J]. (2017).Google ScholarGoogle Scholar
  28. [28] Wang Y., Zhang Y., and Liu B.. 2017. Sentiment lexicon expansion based on neural PU learning, double dictionary lookup, and polarity association[C]. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Yong Ren, Kaji N., Yoshinaga N., et al. 2011. Sentiment classification in resource-scarce languages by using label propagation[C]. In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC'25). Singapore, (Dec. 16–18, 2011), 420429.Google ScholarGoogle Scholar
  30. [30] Huang S., Niu Z., and Shi C.. 2014. Automatic construction of domain-specific sentiment lexicon based on constrained label propagation[J]. Knowledge Based Systems 56 (2014), 191200.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Zhao C., Wang S., and Li D.. 2019. Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons[J]. Computer Speech & Language (2019).Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Xudong Hong, Zhengtao Yu, Xin Yan, et al. 2015. Emotional polarity recognition of new words based on label propagation algorithm[J]. Journal of Frontiers of Computer Science & Technology 9, 12 (2015), 15061512.Google ScholarGoogle Scholar
  33. [33] Huashan Pan, Xin Yan, Feng Zhou, et al. 2016. A Khmer word segmentation and part-of-speech tagging method based on cascaded conditional random fields[J]. Journal of Chinese Information Processing 30, 4 (2016), 110116.Google ScholarGoogle Scholar

Index Terms

  1. Khmer Sentiment Lexicon Based on PU Learning and Label Propagation Algorithm

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 3
      March 2023
      570 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3579816
      Issue’s Table of Contents

      ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 March 2023
      • Online AM: 29 September 2022
      • Accepted: 14 September 2022
      • Revised: 16 August 2022
      • Received: 13 January 2021
      Published in tallip Volume 22, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • note
    • Article Metrics

      • Downloads (Last 12 months)118
      • Downloads (Last 6 weeks)8

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!