Abstract
While research on English opinion mining has already achieved significant progress and success, work on Arabic opinion mining is still lagging. This is mainly due to the relative recency of research efforts in developing natural language processing (NLP) methods for Arabic, handling its morphological complexity, and the lack of large-scale opinion resources for Arabic. To close this gap, we examine the class of models used for English and that do not require extensive use of NLP or opinion resources. In particular, we consider the Recursive Auto Encoder (RAE). However, RAE models are not as successful in Arabic as they are in English, due to their limitations in handling the morphological complexity of Arabic, providing a more complete and comprehensive input features for the auto encoder, and performing semantic composition following the natural way constituents are combined to express the overall meaning. In this article, we propose A Recursive Deep Learning Model for Opinion Mining in Arabic (AROMA) that addresses these limitations. AROMA was evaluated on three Arabic corpora representing different genres and writing styles. Results show that AROMA achieved significant performance improvements compared to the baseline RAE. It also outperformed several well-known approaches in the literature.
- Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12. Google Scholar
Digital Library
- Ahmed Abbasi, Stephen France, Zhu Zhang, and Hsinchun Chen. 2011. Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23, 3 (2011), 447--462. Google Scholar
Digital Library
- Muhammad Abdul-Mageed and Mona T. Diab. 2014. SANA: A large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 1162--1169.Google Scholar
- Muhammad Abdul-Mageed, Mona T. Diab, and Mohammed Korayem. 2011. Subjectivity and sentiment analysis of modern standard arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 587--591. Google Scholar
Digital Library
- Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, and Aitor Soroa. 2015. Big data for natural language processing: A streaming approach. Knowl.-Based Syst. 79 (2015), 36--42. Google Scholar
Digital Library
- Mohammed N. Al-Kabi, Nawaf A. Abdulla, and Mahmoud Al-Ayyoub. 2013. An analytical study of arabic sentiments: Maktoob case study. In Proceedings of the 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST’13). IEEE, 89--94.Google Scholar
- Ahmad A. Al Sallab, Ramy Baly, Gilbert Badaro, Hazem Hajj, Wassim El Hajj, and Khaled B. Shaban. 2015. Deep learning models for sentiment analysis in arabic. In ANLP Workshop 2015. 9 (July 2015).Google Scholar
- Fahad Alotaiby, Salah Foda, and Ibrahim Alkharashi. 2014. Arabic vs. english: comparative statistical study. Arab. J. Sci. Eng. 39, 2 (2014), 809--820.Google Scholar
Cross Ref
- Mohamed A. Aly and Amir F. Atiya. 2013. LABR: A large scale arabic book reviews dataset. In ACL (2). 494--498 (August 2013).Google Scholar
- Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Wassim El-Hajj, and Khaled Bashir Shaban. 2015. A light lexicon-based mobile application for sentiment mining of arabic tweets. In ANLP Workshop 2015. 18.Google Scholar
Cross Ref
- Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. ANLP 2014, 165.Google Scholar
- Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, 437--478.Google Scholar
Digital Library
- William Black, Sabri Elkateb, Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, Adam Pease, and Christiane Fellbaum. 2006. Introducing the arabic wordnet project. In Proceedings of the 3rd International WordNet Conference. Citeseer, 295--300.Google Scholar
- Erik Cambria and Amir Hussain. 2015. Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Vol. 1. Springer. Google Scholar
Digital Library
- Noam Chomsky. 1959. On certain formal properties of grammars. Inf. Control 2, 2 (1959), 137--167.Google Scholar
Cross Ref
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167. Google Scholar
Digital Library
- Ahmed El Kholy and Nizar Habash. 2012. Orthographic and morphological processing for english--arabic statistical machine translation. Mach. Transl. 26, 1--2 (2012), 25--45. Google Scholar
Digital Library
- Rasheed M. Elawady, Sherif Barakat, and M. Elrashidy Nora. 2014. Sentiment analyzer for arabic comments. Int. J. Inf. Sci. Intell. Syst. 3, 4 (2014), 73--86.Google Scholar
- Andrea Esuli and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06), Vol. 6. Citeseer, 417--422.Google Scholar
- Noura Farra, Kathleen McKeown, and Nizar Habash. 2015. Annotating targets of opinions in arabic using crowdsourcing. In ANLP Workshop 2015. 89.Google Scholar
Cross Ref
- Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanf. 1 (2009), 12.Google Scholar
- Spence Green and Christopher D Manning. 2010. Better arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402. Google Scholar
Digital Library
- Nizar Habash and Owen Rambow. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 573--580. Google Scholar
Digital Library
- Nizar Habash and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 49--52. Google Scholar
Digital Library
- Nizar Y. Habash. 2010. Introduction to arabic natural language processing. Synth. Lect. Hum. Lang. Technol. 3, 1 (2010), 1--187.Google Scholar
Cross Ref
- Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (2006), 1527--1554. Google Scholar
Digital Library
- Hossam S. Ibrahim, Sherif M. Abdou, and Mervat Gheith. 2015. Sentiment analysis for modern standard arabic and colloquial. arXiv:1505.03105 (2015).Google Scholar
- Aamera Z. H. Khan, Mohammad Atique, and V. M. Thakare. 2015. Combining lexicon-based and learning-based methods for twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science 8 Engineering (IJECSCSE) (2015), 89.Google Scholar
- Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the omg! Icwsm 11 (2011), 538--541.Google Scholar
- Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, 415--463.Google Scholar
- Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki. 2004. The penn arabic treebank: Building a large-scale annotated arabic corpus. In Proceedings of the Network for Euro-Mediterranean Language Resources (NEMLAR) Conference on Arabic Language Resources and Tools, Vol. 27. 466--467.Google Scholar
- Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, and Zaghouani Wajdi. 2010a. Arabic treebank: Part 1 v 4.1. LDC Catalog No. LDC2010T13. ISBN (2010).Google Scholar
- Mohamed Maamouri, Dave Graff, Basma Bouziri, Sondos Krouna, and Seth Kulick. 2010b. LDC standard arabic morphological analyzer (SAMA) v. 3.1. LDC Catalog No. LDC2010L01. ISBN (2010), 1--58563.Google Scholar
- T. Mikolov and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Adv. Neur. Inf. Process. Syst. (2013). Google Scholar
Digital Library
- George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to wordnet: An on-line lexical database. Int. J. of Lexicogr. 3, 4 (1990), 235--244.Google Scholar
Cross Ref
- Behrang Mohit, Alla Rozovskaya, Nizar Habash, Wajdi Zaghouani, and Ossama Obeid. 2014. The first QALB shared task on automatic text correction for arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 39--47.Google Scholar
Cross Ref
- Asmaa Mountassir, Houda Benbrahim, and Ilham Berrada. 2012. A cross-study of sentiment classification on arabic corpora. In Research and Development in Intelligent Systems XXIX. Springer, 259--272.Google Scholar
- Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M. Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. 2016. Developing a successful semeval task in sentiment analysis of twitter and other social media texts. Lang. Resourc. Eval. 50, 1 (2016), 35--65. Google Scholar
Digital Library
- Nazlia Omar, Mohammed Albared, Adel Qasem Al-Shabi, and Tareq Al-Moslmi. 2013. Ensemble of classification algorithms for subjectivity and sentiment analysis of arabic customers’ reviews. Int. J. Adv. Comput. Technol. 5, 14 (2013), 77.Google Scholar
- Arfath Pasha, Mohamed Al-Badrashiny, Mona T. Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14), Vol. 14. 1094--1101.Google Scholar
- Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl.-Based Syst. 89 (2015), 14--46. Google Scholar
Digital Library
- Eshrag Refaee and Verena Rieser. 2014. An arabic twitter corpus for subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 2268--2273.Google Scholar
- Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega. 2011. OCA: Opinion corpus for arabic. J. Am. Soc. Inf. Sci. Technol. 62, 10 (2011), 2045--2054. Google Scholar
Digital Library
- Anas Shahrour, Salam Khalifa, and Nizar Habash. 2016. Improving arabic diacritization through syntactic analysis. In LREC.Google Scholar
- Amira Shoukry and Ahmed Rafea. 2012. Sentence-level arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems (CTS’12). IEEE, 546--550.Google Scholar
Cross Ref
- Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011a. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML’11). 129--136. Google Scholar
Digital Library
- Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011b. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 151--161. Google Scholar
Digital Library
- Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.Google Scholar
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075 (2015).Google Scholar
- Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.Google Scholar
Cross Ref
- UNESCO. 2014. World Arabic Language Day. Retrieved from http://english.alarabiya.net/articles/2012/12/18/2558 53.html.Google Scholar
Index Terms
AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language
Recommendations
A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic
Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the ...
Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a ...
DeepNetDevanagari: a deep learning model for Devanagari ancient character recognition
AbstractDevanagari script is the most widely used script in India and other Asian countries. There is a rich collection of ancient Devanagari manuscripts, which is a wealth of knowledge. To make these manuscripts available to people, efforts are being ...






Comments