skip to main content
research-article

AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language

Published:13 July 2017Publication History
Skip Abstract Section

Abstract

While research on English opinion mining has already achieved significant progress and success, work on Arabic opinion mining is still lagging. This is mainly due to the relative recency of research efforts in developing natural language processing (NLP) methods for Arabic, handling its morphological complexity, and the lack of large-scale opinion resources for Arabic. To close this gap, we examine the class of models used for English and that do not require extensive use of NLP or opinion resources. In particular, we consider the Recursive Auto Encoder (RAE). However, RAE models are not as successful in Arabic as they are in English, due to their limitations in handling the morphological complexity of Arabic, providing a more complete and comprehensive input features for the auto encoder, and performing semantic composition following the natural way constituents are combined to express the overall meaning. In this article, we propose A Recursive Deep Learning Model for Opinion Mining in Arabic (AROMA) that addresses these limitations. AROMA was evaluated on three Arabic corpora representing different genres and writing styles. Results show that AROMA achieved significant performance improvements compared to the baseline RAE. It also outperformed several well-known approaches in the literature.

References

  1. Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahmed Abbasi, Stephen France, Zhu Zhang, and Hsinchun Chen. 2011. Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23, 3 (2011), 447--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Muhammad Abdul-Mageed and Mona T. Diab. 2014. SANA: A large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 1162--1169.Google ScholarGoogle Scholar
  4. Muhammad Abdul-Mageed, Mona T. Diab, and Mohammed Korayem. 2011. Subjectivity and sentiment analysis of modern standard arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 587--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, and Aitor Soroa. 2015. Big data for natural language processing: A streaming approach. Knowl.-Based Syst. 79 (2015), 36--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mohammed N. Al-Kabi, Nawaf A. Abdulla, and Mahmoud Al-Ayyoub. 2013. An analytical study of arabic sentiments: Maktoob case study. In Proceedings of the 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST’13). IEEE, 89--94.Google ScholarGoogle Scholar
  7. Ahmad A. Al Sallab, Ramy Baly, Gilbert Badaro, Hazem Hajj, Wassim El Hajj, and Khaled B. Shaban. 2015. Deep learning models for sentiment analysis in arabic. In ANLP Workshop 2015. 9 (July 2015).Google ScholarGoogle Scholar
  8. Fahad Alotaiby, Salah Foda, and Ibrahim Alkharashi. 2014. Arabic vs. english: comparative statistical study. Arab. J. Sci. Eng. 39, 2 (2014), 809--820.Google ScholarGoogle ScholarCross RefCross Ref
  9. Mohamed A. Aly and Amir F. Atiya. 2013. LABR: A large scale arabic book reviews dataset. In ACL (2). 494--498 (August 2013).Google ScholarGoogle Scholar
  10. Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Wassim El-Hajj, and Khaled Bashir Shaban. 2015. A light lexicon-based mobile application for sentiment mining of arabic tweets. In ANLP Workshop 2015. 18.Google ScholarGoogle ScholarCross RefCross Ref
  11. Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. ANLP 2014, 165.Google ScholarGoogle Scholar
  12. Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, 437--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William Black, Sabri Elkateb, Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, Adam Pease, and Christiane Fellbaum. 2006. Introducing the arabic wordnet project. In Proceedings of the 3rd International WordNet Conference. Citeseer, 295--300.Google ScholarGoogle Scholar
  14. Erik Cambria and Amir Hussain. 2015. Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Vol. 1. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Noam Chomsky. 1959. On certain formal properties of grammars. Inf. Control 2, 2 (1959), 137--167.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ahmed El Kholy and Nizar Habash. 2012. Orthographic and morphological processing for english--arabic statistical machine translation. Mach. Transl. 26, 1--2 (2012), 25--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rasheed M. Elawady, Sherif Barakat, and M. Elrashidy Nora. 2014. Sentiment analyzer for arabic comments. Int. J. Inf. Sci. Intell. Syst. 3, 4 (2014), 73--86.Google ScholarGoogle Scholar
  19. Andrea Esuli and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06), Vol. 6. Citeseer, 417--422.Google ScholarGoogle Scholar
  20. Noura Farra, Kathleen McKeown, and Nizar Habash. 2015. Annotating targets of opinions in arabic using crowdsourcing. In ANLP Workshop 2015. 89.Google ScholarGoogle ScholarCross RefCross Ref
  21. Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanf. 1 (2009), 12.Google ScholarGoogle Scholar
  22. Spence Green and Christopher D Manning. 2010. Better arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nizar Habash and Owen Rambow. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 573--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nizar Habash and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 49--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nizar Y. Habash. 2010. Introduction to arabic natural language processing. Synth. Lect. Hum. Lang. Technol. 3, 1 (2010), 1--187.Google ScholarGoogle ScholarCross RefCross Ref
  26. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (2006), 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hossam S. Ibrahim, Sherif M. Abdou, and Mervat Gheith. 2015. Sentiment analysis for modern standard arabic and colloquial. arXiv:1505.03105 (2015).Google ScholarGoogle Scholar
  28. Aamera Z. H. Khan, Mohammad Atique, and V. M. Thakare. 2015. Combining lexicon-based and learning-based methods for twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science 8 Engineering (IJECSCSE) (2015), 89.Google ScholarGoogle Scholar
  29. Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the omg! Icwsm 11 (2011), 538--541.Google ScholarGoogle Scholar
  30. Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, 415--463.Google ScholarGoogle Scholar
  31. Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki. 2004. The penn arabic treebank: Building a large-scale annotated arabic corpus. In Proceedings of the Network for Euro-Mediterranean Language Resources (NEMLAR) Conference on Arabic Language Resources and Tools, Vol. 27. 466--467.Google ScholarGoogle Scholar
  32. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, and Zaghouani Wajdi. 2010a. Arabic treebank: Part 1 v 4.1. LDC Catalog No. LDC2010T13. ISBN (2010).Google ScholarGoogle Scholar
  33. Mohamed Maamouri, Dave Graff, Basma Bouziri, Sondos Krouna, and Seth Kulick. 2010b. LDC standard arabic morphological analyzer (SAMA) v. 3.1. LDC Catalog No. LDC2010L01. ISBN (2010), 1--58563.Google ScholarGoogle Scholar
  34. T. Mikolov and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Adv. Neur. Inf. Process. Syst. (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to wordnet: An on-line lexical database. Int. J. of Lexicogr. 3, 4 (1990), 235--244.Google ScholarGoogle ScholarCross RefCross Ref
  36. Behrang Mohit, Alla Rozovskaya, Nizar Habash, Wajdi Zaghouani, and Ossama Obeid. 2014. The first QALB shared task on automatic text correction for arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 39--47.Google ScholarGoogle ScholarCross RefCross Ref
  37. Asmaa Mountassir, Houda Benbrahim, and Ilham Berrada. 2012. A cross-study of sentiment classification on arabic corpora. In Research and Development in Intelligent Systems XXIX. Springer, 259--272.Google ScholarGoogle Scholar
  38. Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M. Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. 2016. Developing a successful semeval task in sentiment analysis of twitter and other social media texts. Lang. Resourc. Eval. 50, 1 (2016), 35--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nazlia Omar, Mohammed Albared, Adel Qasem Al-Shabi, and Tareq Al-Moslmi. 2013. Ensemble of classification algorithms for subjectivity and sentiment analysis of arabic customers’ reviews. Int. J. Adv. Comput. Technol. 5, 14 (2013), 77.Google ScholarGoogle Scholar
  40. Arfath Pasha, Mohamed Al-Badrashiny, Mona T. Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14), Vol. 14. 1094--1101.Google ScholarGoogle Scholar
  41. Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl.-Based Syst. 89 (2015), 14--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Eshrag Refaee and Verena Rieser. 2014. An arabic twitter corpus for subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 2268--2273.Google ScholarGoogle Scholar
  43. Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega. 2011. OCA: Opinion corpus for arabic. J. Am. Soc. Inf. Sci. Technol. 62, 10 (2011), 2045--2054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Anas Shahrour, Salam Khalifa, and Nizar Habash. 2016. Improving arabic diacritization through syntactic analysis. In LREC.Google ScholarGoogle Scholar
  45. Amira Shoukry and Ahmed Rafea. 2012. Sentence-level arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems (CTS’12). IEEE, 546--550.Google ScholarGoogle ScholarCross RefCross Ref
  46. Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011a. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML’11). 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011b. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 151--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.Google ScholarGoogle Scholar
  49. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075 (2015).Google ScholarGoogle Scholar
  50. Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.Google ScholarGoogle ScholarCross RefCross Ref
  51. UNESCO. 2014. World Arabic Language Day. Retrieved from http://english.alarabiya.net/articles/2012/12/18/2558 53.html.Google ScholarGoogle Scholar

Index Terms

  1. AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 4
      December 2017
      146 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3097269
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 July 2017
      • Accepted: 1 April 2017
      • Revised: 1 February 2017
      • Received: 1 May 2016
      Published in tallip Volume 16, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!