Abstract
In the era of deep learning-based systems, efficient input representation is one of the primary requisites in solving various problems related to Natural Language Processing (NLP), data mining, text mining, and the like. Absence of adequate representation for an input introduces the problem of data sparsity, and it poses a great challenge to solve the underlying problem. The problem is more intensified with resource-poor languages due to the absence of a sufficiently large corpus required to train a word embedding model. In this work, we propose an effective method to improve the word embedding coverage in less-resourced languages by leveraging bilingual word embeddings learned from different corpora. We train and evaluate deep Long Short Term Memory (LSTM)-based architecture and show the effectiveness of the proposed approach for two aspect-level sentiment analysis tasks (i.e., aspect term extraction and sentiment classification). The neural network architecture is further assisted by hand-crafted features for prediction. We apply the proposed model in two experimental setups: multi-lingual and cross-lingual. Experimental results show the effectiveness of the proposed approach against the state-of-the-art methods.
- Md Shad Akhtar, Deepak, Asif Ekbal, and Pushpak Bhattacharyya. 2017. Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis. Knowledge-Based Systems 125 (2017), 116--135. Google Scholar
Digital Library
- Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya. 2016a. Aspect based sentiment analysis in Hindi: Resource creation and evaluation. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016. European Language Resources Association (ELRA), Portoro, Slovenia, 2703--2709.Google Scholar
- Md Shad Akhtar, Ayush Kumar, Asif Ekbal, and Pushpak Bhattacharyya. 2016b. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical Papers, December 11-16, 2016. Osaka, Japan, 482--493.Google Scholar
- Md Shad Akhtar, Palaash Sawant, Sukanta Sen, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Solving data sparsity for aspect based sentiment analysis using cross-linguality and multi-linguality. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 572--582. http://aclweb.org/anthology/N18-1053.Google Scholar
Cross Ref
- Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the7th International Conference on Language Resources and Evaluation (LREC 2010), May 17-23, 2010 (17-23). European Language Resources Association (ELRA), Valletta, Malta, 2200--2204.Google Scholar
- Dzmitry Bahdanau, Tom Bosc, Stanislaw Jastrzebski, Edward Grefenstette, Pascal Vincent, and Yoshua Bengio. 2017. Learning to compute word embeddings on the fly. CoRR abs/1706.00286 (2017). arxiv:1706.00286 http://arxiv.org/abs/1706.00286.Google Scholar
- Akshat Bakliwal, Piyush Arora, and Vasudeva Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), May 21-27, 2012. Istanbul, Turkey, 1189--1196.Google Scholar
- A. R. Balamurali, Aditya Joshi, and Pushpak Bhattacharyya. 2012. Cross-lingual sentiment analysis for Indian languages using linked wordnets. In Proceedings of the 24th International Conference on Computational Linguistics (COLING): Posters, 8-15 December 2012. Mumbai, India, 73--82.Google Scholar
- Jeremy Barnes, Patrik Lambert, and Toni Badia. 2016. Exploring distributional representations and machine translation for aspect-based cross-lingual sentiment classification. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical Papers, December 11-16, 2016. Osaka, Japan, 1613--1623.Google Scholar
- Pushpak Bhattacharyya. 2010. IndoWordnet. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010). Valletta, Malta, 3785--3792.Google Scholar
- Maryna Chernyshevich. 2014. IHS R8D belarus: Cross-domain extraction of product features using conditional random fields. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), August 23-24, 2014. Dublin, Ireland, 309--313.Google Scholar
Cross Ref
- Amitava Das and Sivaji Bandyopadhyay. 2010. SentiWordNet for indian languages. In Proceedings of the 8th Workshop on Asian Federation for Natural Language Processing, August 2010. Beijing, China, 56--63.Google Scholar
- Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. 2017. A comparative study of word embeddings for reading comprehension. CoRR abs/1703.00993 (2017). arxiv:1703.00993 http://arxiv.org/abs/1703.00993.Google Scholar
- Xiaowen Ding, Bing Liu, and Philip S. Yu. 2008. A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08). ACM, New York, 231--240. Google Scholar
Digital Library
- Manaal Faruqui and Chris Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Gothenburg, Sweden, 462--471. http://www.aclweb.org/anthology/E14-1049.Google Scholar
Cross Ref
- J. R. Firth. 1957. A synopsis of linguistic theory 1930-55. Studies in Linguistic Analysis (special volume of the Philological Society) 1952-59 (1957), 1--32.Google Scholar
- Deepak Kumar Gupta, Kandula Srikanth Reddy, Asif Ekbal, et al. 2015. PSO-ASent: Feature selection using particle swarm optimization for aspect based sentiment analysis. In Natural Language Processing and Information Systems (NLDB 2015), June 17-19 2015. Springer, Passau, Germany, 220--233.Google Scholar
Cross Ref
- Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Madrid, Spain, 174--181. Google Scholar
Digital Library
- Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 187--197. Google Scholar
Digital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google Scholar
Digital Library
- Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168--177. Google Scholar
Digital Library
- V. S. Jagtap and Karishma Pawar. 2013. Analysis of different approaches to sentence-level sentiment classification. International Journal of Scientific Engineering and Technology (ISSN: 2277-1581) 2 (2013), 164--170.Google Scholar
- Aditya Joshi, A. R. Balamurali, and Pushpak Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: A case study. In Proceedings of the 8th International Conference on Natural Language Processing (ICON 2010). Kharagpur, India.Google Scholar
- Rasoul Kaljahi and Jennifer Foster. 2016. Detecting opinion polarities using kernel methods. In Proceedings of the Workshop on Computational Modelling of People’s Opinions, Personality, and Emotions in Social Media. Osaka, Japan, 60--69.Google Scholar
- Soo-Min Kim and Eduard Hovy. 2004. Determining the sentiment of opinions. In Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 1367. Google Scholar
Digital Library
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980.Google Scholar
- Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of the1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95), Vol. 1. IEEE, 181--184.Google Scholar
Cross Ref
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 177--180. Google Scholar
Digital Library
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 48--54. Google Scholar
Digital Library
- Ayush Kumar, Sarah Kohail, Asif Ekbal, and Chris Biemann. 2015. IIT-TUDA: System for sentiment analysis in Indian languages using lexical acquisition. In Mining Intelligence and Knowledge Exploration. Springer, 684--693. Google Scholar
Digital Library
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Bilingual word representations with monolingual quality in mind. In NAACL Workshop on Vector Space Modeling for NLP. Denver, United States, 151--159.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013).Google Scholar
- Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. 2016. How translation alters sentiment. Journal of Artificial Intelligence Research 55, 1 (Jan. 2016), 95--130. Google Scholar
Digital Library
- Arjun Mukherjee and Bing Liu. 2012. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL’12). 339--348. Google Scholar
Digital Library
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 160--167. Google Scholar
Digital Library
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51. Google Scholar
Digital Library
- Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Stroudsburg, PA, USA, 115--124. Google Scholar
Digital Library
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, 1532--1543.Google Scholar
- Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland, 27--35.Google Scholar
Cross Ref
- Soujanya Poria, Erik Cambria, and Alexander Gelbukh. 2016. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems 108 (2016), 42--49. Google Scholar
Digital Library
- Prerana Singhal and Pushpak Bhattacharyya. 2016. Borrow a little from your rich cousin: Using embeddings and polarities of English words for multilingual sentiment classification. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical Papers, December 11-16, 2016. Osaka, Japan, 3053--3062.Google Scholar
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958. Google Scholar
Digital Library
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4 (CONLL’03). 142--147. Google Scholar
Digital Library
- Zhiqiang Toh and Wenting Wang. 2014. DLIREC: Aspect term extraction and term polarity classification system. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland, 235--240.Google Scholar
Cross Ref
- P. D. Turney. 2002. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Association for Computational Linguistics (ACL). Philadelphia, USA, 417--424. Google Scholar
Digital Library
- Ivan Vulić and Marie-Francine Moens. 2015. Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Beijing, China, 719--725. http://www.aclweb.org/anthology/P15-2118.Google Scholar
- Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova, Jennifer Foster, and Lamia Tounsi. 2014. DCU: Aspect-based polarity classification for SemEval task 4. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland, 223--229.Google Scholar
Cross Ref
- Janyce Wiebe and Rada Mihalcea. 2006. Word sense and subjectivity. In Proceedings of the 21st International Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the Association for Computational Linguistics (ACL) (ACL-44). Association for Computational Linguistics, Stroudsburg, PA, USA, 1065--1072. Google Scholar
Digital Library
- Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016. Cross-lingual sentiment classification with bilingual document representation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin, Germany, 1403--1412.Google Scholar
Cross Ref
- Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie review mining and summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). ACM, New York, 43--50. Google Scholar
Digital Library
Index Terms
Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis
Recommendations
Cross-Lingual projections vs. corpora extracted subjectivity lexicons for less-resourced languages
CICLing'13: Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2Subjectivity tagging is a prior step for sentiment annotation. Both machine learning based approaches and linguistic knowledge based ones profit from using subjectivity lexicons. However, most of these kinds of resources are often available only for ...
Improving bilingual word embeddings mapping with monolingual context information
AbstractBilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train ...
Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning
Existing supervised solutions for Named Entity Recognition (NER) typically rely on a large annotated corpus. Collecting large amounts of NER annotated corpus is time-consuming and requires considerable human effort. However, collecting small amounts of ...






Comments