skip to main content
research-article

Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis

Authors Info & Claims
Published:17 December 2018Publication History
Skip Abstract Section

Abstract

In the era of deep learning-based systems, efficient input representation is one of the primary requisites in solving various problems related to Natural Language Processing (NLP), data mining, text mining, and the like. Absence of adequate representation for an input introduces the problem of data sparsity, and it poses a great challenge to solve the underlying problem. The problem is more intensified with resource-poor languages due to the absence of a sufficiently large corpus required to train a word embedding model. In this work, we propose an effective method to improve the word embedding coverage in less-resourced languages by leveraging bilingual word embeddings learned from different corpora. We train and evaluate deep Long Short Term Memory (LSTM)-based architecture and show the effectiveness of the proposed approach for two aspect-level sentiment analysis tasks (i.e., aspect term extraction and sentiment classification). The neural network architecture is further assisted by hand-crafted features for prediction. We apply the proposed model in two experimental setups: multi-lingual and cross-lingual. Experimental results show the effectiveness of the proposed approach against the state-of-the-art methods.

References

  1. Md Shad Akhtar, Deepak, Asif Ekbal, and Pushpak Bhattacharyya. 2017. Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis. Knowledge-Based Systems 125 (2017), 116--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya. 2016a. Aspect based sentiment analysis in Hindi: Resource creation and evaluation. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016. European Language Resources Association (ELRA), Portoro, Slovenia, 2703--2709.Google ScholarGoogle Scholar
  3. Md Shad Akhtar, Ayush Kumar, Asif Ekbal, and Pushpak Bhattacharyya. 2016b. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical Papers, December 11-16, 2016. Osaka, Japan, 482--493.Google ScholarGoogle Scholar
  4. Md Shad Akhtar, Palaash Sawant, Sukanta Sen, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Solving data sparsity for aspect based sentiment analysis using cross-linguality and multi-linguality. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 572--582. http://aclweb.org/anthology/N18-1053.Google ScholarGoogle ScholarCross RefCross Ref
  5. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the7th International Conference on Language Resources and Evaluation (LREC 2010), May 17-23, 2010 (17-23). European Language Resources Association (ELRA), Valletta, Malta, 2200--2204.Google ScholarGoogle Scholar
  6. Dzmitry Bahdanau, Tom Bosc, Stanislaw Jastrzebski, Edward Grefenstette, Pascal Vincent, and Yoshua Bengio. 2017. Learning to compute word embeddings on the fly. CoRR abs/1706.00286 (2017). arxiv:1706.00286 http://arxiv.org/abs/1706.00286.Google ScholarGoogle Scholar
  7. Akshat Bakliwal, Piyush Arora, and Vasudeva Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), May 21-27, 2012. Istanbul, Turkey, 1189--1196.Google ScholarGoogle Scholar
  8. A. R. Balamurali, Aditya Joshi, and Pushpak Bhattacharyya. 2012. Cross-lingual sentiment analysis for Indian languages using linked wordnets. In Proceedings of the 24th International Conference on Computational Linguistics (COLING): Posters, 8-15 December 2012. Mumbai, India, 73--82.Google ScholarGoogle Scholar
  9. Jeremy Barnes, Patrik Lambert, and Toni Badia. 2016. Exploring distributional representations and machine translation for aspect-based cross-lingual sentiment classification. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical Papers, December 11-16, 2016. Osaka, Japan, 1613--1623.Google ScholarGoogle Scholar
  10. Pushpak Bhattacharyya. 2010. IndoWordnet. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010). Valletta, Malta, 3785--3792.Google ScholarGoogle Scholar
  11. Maryna Chernyshevich. 2014. IHS R8D belarus: Cross-domain extraction of product features using conditional random fields. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), August 23-24, 2014. Dublin, Ireland, 309--313.Google ScholarGoogle ScholarCross RefCross Ref
  12. Amitava Das and Sivaji Bandyopadhyay. 2010. SentiWordNet for indian languages. In Proceedings of the 8th Workshop on Asian Federation for Natural Language Processing, August 2010. Beijing, China, 56--63.Google ScholarGoogle Scholar
  13. Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. 2017. A comparative study of word embeddings for reading comprehension. CoRR abs/1703.00993 (2017). arxiv:1703.00993 http://arxiv.org/abs/1703.00993.Google ScholarGoogle Scholar
  14. Xiaowen Ding, Bing Liu, and Philip S. Yu. 2008. A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08). ACM, New York, 231--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Manaal Faruqui and Chris Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Gothenburg, Sweden, 462--471. http://www.aclweb.org/anthology/E14-1049.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. R. Firth. 1957. A synopsis of linguistic theory 1930-55. Studies in Linguistic Analysis (special volume of the Philological Society) 1952-59 (1957), 1--32.Google ScholarGoogle Scholar
  17. Deepak Kumar Gupta, Kandula Srikanth Reddy, Asif Ekbal, et al. 2015. PSO-ASent: Feature selection using particle swarm optimization for aspect based sentiment analysis. In Natural Language Processing and Information Systems (NLDB 2015), June 17-19 2015. Springer, Passau, Germany, 220--233.Google ScholarGoogle ScholarCross RefCross Ref
  18. Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Madrid, Spain, 174--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 187--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. S. Jagtap and Karishma Pawar. 2013. Analysis of different approaches to sentence-level sentiment classification. International Journal of Scientific Engineering and Technology (ISSN: 2277-1581) 2 (2013), 164--170.Google ScholarGoogle Scholar
  23. Aditya Joshi, A. R. Balamurali, and Pushpak Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: A case study. In Proceedings of the 8th International Conference on Natural Language Processing (ICON 2010). Kharagpur, India.Google ScholarGoogle Scholar
  24. Rasoul Kaljahi and Jennifer Foster. 2016. Detecting opinion polarities using kernel methods. In Proceedings of the Workshop on Computational Modelling of People’s Opinions, Personality, and Emotions in Social Media. Osaka, Japan, 60--69.Google ScholarGoogle Scholar
  25. Soo-Min Kim and Eduard Hovy. 2004. Determining the sentiment of opinions. In Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 1367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  27. Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of the1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95), Vol. 1. IEEE, 181--184.Google ScholarGoogle ScholarCross RefCross Ref
  28. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 48--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ayush Kumar, Sarah Kohail, Asif Ekbal, and Chris Biemann. 2015. IIT-TUDA: System for sentiment analysis in Indian languages using lexical acquisition. In Mining Intelligence and Knowledge Exploration. Springer, 684--693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Bilingual word representations with monolingual quality in mind. In NAACL Workshop on Vector Space Modeling for NLP. Denver, United States, 151--159.Google ScholarGoogle Scholar
  32. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  33. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. 2016. How translation alters sentiment. Journal of Artificial Intelligence Research 55, 1 (Jan. 2016), 95--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arjun Mukherjee and Bing Liu. 2012. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL’12). 339--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Stroudsburg, PA, USA, 115--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, 1532--1543.Google ScholarGoogle Scholar
  39. Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland, 27--35.Google ScholarGoogle ScholarCross RefCross Ref
  40. Soujanya Poria, Erik Cambria, and Alexander Gelbukh. 2016. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems 108 (2016), 42--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Prerana Singhal and Pushpak Bhattacharyya. 2016. Borrow a little from your rich cousin: Using embeddings and polarities of English words for multilingual sentiment classification. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical Papers, December 11-16, 2016. Osaka, Japan, 3053--3062.Google ScholarGoogle Scholar
  42. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4 (CONLL’03). 142--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhiqiang Toh and Wenting Wang. 2014. DLIREC: Aspect term extraction and term polarity classification system. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland, 235--240.Google ScholarGoogle ScholarCross RefCross Ref
  45. P. D. Turney. 2002. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Association for Computational Linguistics (ACL). Philadelphia, USA, 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ivan Vulić and Marie-Francine Moens. 2015. Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Beijing, China, 719--725. http://www.aclweb.org/anthology/P15-2118.Google ScholarGoogle Scholar
  47. Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova, Jennifer Foster, and Lamia Tounsi. 2014. DCU: Aspect-based polarity classification for SemEval task 4. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland, 223--229.Google ScholarGoogle ScholarCross RefCross Ref
  48. Janyce Wiebe and Rada Mihalcea. 2006. Word sense and subjectivity. In Proceedings of the 21st International Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the Association for Computational Linguistics (ACL) (ACL-44). Association for Computational Linguistics, Stroudsburg, PA, USA, 1065--1072. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016. Cross-lingual sentiment classification with bilingual document representation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin, Germany, 1403--1412.Google ScholarGoogle ScholarCross RefCross Ref
  50. Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie review mining and summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). ACM, New York, 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!