skip to main content
research-article

Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks

Authors Info & Claims
Published:01 July 2020Publication History
Skip Abstract Section

Abstract

Text Categorization is an important task in the area of Natural Language Processing (NLP). Its goal is to learn a model that can accurately classify any textual document for a given language into one of a set of predefined categories. In the context of the Arabic language, several approaches have been proposed to tackle this problem, many of which are based on the bag-of-words assumption. Even though these methods usually produce good results for the classification task, they often fail to capture contextual dependencies from textual data. On the other hand, deep learning architectures that are usually based on Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs) do not suffer from such a limitation and have recently shown very promising results in various NLP applications. In this work, we use deep learning models that combine RNN and CNN for the task of Arabic text categorization using static, dynamic, and fine-tuned word embeddings. The experimental results reported on the Open Source Arabic Corpora (OSAC) dataset have shown the effectiveness and high performance of our proposed models.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from http://tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Mariem Abbes, Zied Kechaou, and Adel M. Alimi. 2017. Enhanced deep learning models for sentiment analysis in Arab social media. In International Conference on Neural Information Processing. Springer, 667--676.Google ScholarGoogle Scholar
  3. Pulkit Agrawal, Ross Girshick, and Jitendra Malik. 2014. Analyzing the performance of multilayer neural networks for object recognition. In European Conference on Computer Vision. Springer, 329--344.Google ScholarGoogle ScholarCross RefCross Ref
  4. Fawaz S. Al-Anzi and Dia AbuZeina. 2017. Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. Journal of King Saud University-Computer and Information Sciences 29, 2 (2017), 189--195.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fawaz S. Al-Anzi and Dia AbuZeina. 2018. Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach. Information Processing 8 Management 54, 1 (2018), 105--115.Google ScholarGoogle Scholar
  6. Mahmoud Al-Ayyoub, Aya Nuseir, Kholoud Alsmearat, Yaser Jararweh, and Brij Gupta. 2018. Deep learning for Arabic NLP: A survey. Journal of Computational Science 26 (2018), 522--531.Google ScholarGoogle ScholarCross RefCross Ref
  7. Sadam Al-Azani and El-Sayed M. El-Alfy. 2017. Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In International Conference on Neural Information Processing. Springer, 491--500.Google ScholarGoogle Scholar
  8. Sami Al-Harbi, Abdulrahman Almuhareb, Abdulmohsen Al-Thubaity, Mohammed Khorsheed, and Abdullah Al-Rajeh. 2008. Automatic Arabic text classification. In Proceedings of the 9th International Conference on Statistical Analysis of Textual Data. 77--83.Google ScholarGoogle Scholar
  9. Mohammad Al-Smadi, Omar Qawasmeh, Mahmoud Al-Ayyoub, Yaser Jararweh, and Brij Gupta. 2018. Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science 27 (2018), 386--393. https://doi.org/10.1016/j.jocs.2017.11.006.Google ScholarGoogle ScholarCross RefCross Ref
  10. Mayy M. Al-Tahrawi and Sumaya N. Al-Khatib. 2015. Arabic text classification using polynomial networks. Journal of King Saud University-Computer and Information Sciences 27, 4 (2015), 437--449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mohamed Seghir Hadj Ameur, Ahmed Guessoum, and Farid Meziane. 2019. Improving Arabic neural machine translation via n-best list re-ranking. Machine Translation 33, 4 (2019), 279--314. DOI:https://doi.org/10.1007/s10590-019-09237-6Google ScholarGoogle ScholarCross RefCross Ref
  12. Mohamed Seghir Hadj Ameur, Farid Meziane, and Ahmed Guessoum. 2017. Arabic machine transliteration using an attention-based encoder-decoder model. In 3rd International Conference on Arabic Computational Linguistics, ACLING 2017, November 5-6, 2017, Dubai, United Arab Emirates. 287--297. DOI:https://doi.org/10.1016/j.procs.2017.10.120Google ScholarGoogle ScholarCross RefCross Ref
  13. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7-9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  14. Riadh Belkebir and Ahmed Guessoum. 2013. A hybrid BSO-Chi2-SVM approach to Arabic text categorization. In 2013 ACS International Conference on Computer Systems and Applications (AICCSA). 1--7. DOI:https://doi.org/10.1109/AICCSA.2013.6616437Google ScholarGoogle ScholarCross RefCross Ref
  15. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. TACL 5 (2017), 135--146. https://transacl.org/ojs/index.php/tacl/article/view/999.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In [email protected] 2014, 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014. 103--111. DOI:https://doi.org/10.3115/v1/W14-4012Google ScholarGoogle Scholar
  17. François Chollet et al. 2015. Keras. Retrieved from https://keras.io.Google ScholarGoogle Scholar
  18. Sumit Chopra, Michael Auli, and Alexander M. Rush. 2016. Abstractive sentence summarization with attentive recurrent neural networks. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 93--98.Google ScholarGoogle Scholar
  19. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Vol. 1. 1107--1116.Google ScholarGoogle ScholarCross RefCross Ref
  21. Mohamed El Kourdi, Amine Bensaid, and Tajje-eddine Rachidi. 2004. Automatic Arabic document categorization based on the Naïve Bayes algorithm. In Workshop on Computational Approaches to Arabic Script-based Languages. Association for Computational Linguistics, 51--58.Google ScholarGoogle ScholarCross RefCross Ref
  22. Christoph Goller and Andreas Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, 1996, Vol. 1. IEEE, 347--352.Google ScholarGoogle Scholar
  23. Chinnappa Guggilla. 2016. Discrimination between similar languages, varieties and dialects using CNN- and LSTM-based deep neural networks. In 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). 185--194.Google ScholarGoogle Scholar
  24. Mohamed Seghir Hadj Ameur, Youcef Moulahoum, and Ahmed Guessoum. 2015. Restoration of Arabic diacritics using a multilevel statistical model. In Computer Science and Its Applications, Abdelmalek Amine (Ed.). Springer International Publishing, Cham, 181--192. DOI:https://doi.org/10.1007/978-3-319-19578-0_15Google ScholarGoogle Scholar
  25. Fouzi Harrag, Eyas El-Qawasmah, and Abdul Malik S. Al-Salman. 2011. Stemming as a feature reduction technique for arabic text categorization. In 10th International Symposium on Programming and Systems (ISPS’11). IEEE, 128--133.Google ScholarGoogle ScholarCross RefCross Ref
  26. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers. 328--339. DOI:https://doi.org/10.18653/v1/P18-1031Google ScholarGoogle ScholarCross RefCross Ref
  28. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In 32nd International Conference on International Conference on Machine Learning—Volume 37 (ICML’15). JMLR.org, 448--456. http://dl.acm.org/citation.cfm?id=3045118.3045167.Google ScholarGoogle Scholar
  29. Vasu Jindal. 2016. A personalized Markov clustering and deep learning approach for Arabic text categorization. In ACL 2016 Student Research Workshop. 145--151.Google ScholarGoogle ScholarCross RefCross Ref
  30. Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, May 31-- June 5, 2015. 103--112. https://www.aclweb.org/anthology/N15-1011/.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of the ACL. 1746--1751. https://www.aclweb.org/anthology/D14-1181/.Google ScholarGoogle ScholarCross RefCross Ref
  32. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7-9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  33. Kanako Komiya and Hiroyuki Shinnou. 2018. Investigating effective parameters for fine-tuning of word embeddings using only a small corpus. In Workshop on Deep Learning Approaches for Low-Resource NLP. 60--67.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ji Young Lee, Franck Dernoncourt, and Peter Szolovits. 2018. Transfer learning for named-entity recognition with neural networks. In 11thInternational Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018. http://www.lrec-conf.org/proceedings/lrec2018/summaries/878.html.Google ScholarGoogle Scholar
  35. Edward Loper and Steven Bird. 2002. NLTK: The natural language toolkit. In ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (ETMTNLP’02). Association for Computational Linguistics, Stroudsburg, PA, 63--70. DOI:https://doi.org/10.3115/1118108.1118117Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Marc Moreno Lopez and Jugal Kalita. 2017. Deep learning applied to NLP. CoRR abs/1703.03091 (2017). arxiv:1703.03091 http://arxiv.org/abs/1703.03091.Google ScholarGoogle Scholar
  37. Prem Melville, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1275--1284.Google ScholarGoogle Scholar
  38. Abdelwadood Mesleh and Ghassan Kanaan. 2008. Support vector machine text classification system: Using ant colony optimization based feature subset selection. In 2008 International Conference on Computer Engineering Systems. 143--148. DOI:https://doi.org/10.1109/ICCES.2008.4772984Google ScholarGoogle ScholarCross RefCross Ref
  39. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings. http://arxiv.org/abs/1301.3781.Google ScholarGoogle Scholar
  40. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google ScholarGoogle Scholar
  41. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In 27th International Conference on Machine Learning (ICML-10). 807--814.Google ScholarGoogle Scholar
  42. Dong Nguyen and A. Seza Doğruöz. 2013. Word level language identification in online multilingual communication. In 2013 Conference on Empirical Methods in Natural Language Processing. 857--862.Google ScholarGoogle Scholar
  43. Motaz K. Saad and Wesam Ashour. 2010. OSAC: Open source Arabic Corpora. In 6th International Conference on Electrical and Computer Systems, Vol. 10.Google ScholarGoogle Scholar
  44. Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. Artificial Neural Networks and Machine Learning—ICANN (2010), 92--101.Google ScholarGoogle Scholar
  45. Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In 2015 Conference on Empirical Methods in Natural Language Processing. 298--307.Google ScholarGoogle ScholarCross RefCross Ref
  46. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673--2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. David Sculley and Gabriel M. Wachman. 2007. Relaxed online SVMs for spam filtering. In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 415--422.Google ScholarGoogle Scholar
  48. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Baoxin Wang. 2018. Disconnected recurrent neural networks for text categorization. In 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2311--2320.Google ScholarGoogle ScholarCross RefCross Ref
  50. Peilu Wang, Yao Qian, Frank K. Soong, Lei He, and Hai Zhao. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. CoRR abs/1510.06168 (2015). arxiv:1510.06168 http://arxiv.org/abs/1510.06168.Google ScholarGoogle Scholar
  51. Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 2 (1989), 270--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Edwin B. Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.Google ScholarGoogle Scholar
  53. Bilal Zahran and Ghassan Kanaan. 2009. Text feature selection using particle swarm optimization Algorithm. World Applied Sciences Journal 7 (2009), 69--74.Google ScholarGoogle Scholar
  54. Dejun Zhang, Long Tian, Mingbo Hong, Fei Han, Yafeng Ren, and Yilin Chen. 2018. Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification. IEEE Access 6 (2018), 73750--73759. DOI:https://doi.org/10.1109/ACCESS.2018.2882878Google ScholarGoogle ScholarCross RefCross Ref
  55. Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis C. M. Lau. 2015. A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015). arxiv:1511.08630 http://arxiv.org/abs/1511.08630.Google ScholarGoogle Scholar
  56. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Evaluation in Information Retrieval. Cambridge University Press, 139--161. https://doi.org/10.1017/CBO9780511809071.009.Google ScholarGoogle Scholar

Index Terms

  1. Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!