Abstract
Text Categorization is an important task in the area of Natural Language Processing (NLP). Its goal is to learn a model that can accurately classify any textual document for a given language into one of a set of predefined categories. In the context of the Arabic language, several approaches have been proposed to tackle this problem, many of which are based on the bag-of-words assumption. Even though these methods usually produce good results for the classification task, they often fail to capture contextual dependencies from textual data. On the other hand, deep learning architectures that are usually based on Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs) do not suffer from such a limitation and have recently shown very promising results in various NLP applications. In this work, we use deep learning models that combine RNN and CNN for the task of Arabic text categorization using static, dynamic, and fine-tuned word embeddings. The experimental results reported on the Open Source Arabic Corpora (OSAC) dataset have shown the effectiveness and high performance of our proposed models.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from http://tensorflow.org/ Software available from tensorflow.org.Google Scholar
- Mariem Abbes, Zied Kechaou, and Adel M. Alimi. 2017. Enhanced deep learning models for sentiment analysis in Arab social media. In International Conference on Neural Information Processing. Springer, 667--676.Google Scholar
- Pulkit Agrawal, Ross Girshick, and Jitendra Malik. 2014. Analyzing the performance of multilayer neural networks for object recognition. In European Conference on Computer Vision. Springer, 329--344.Google Scholar
Cross Ref
- Fawaz S. Al-Anzi and Dia AbuZeina. 2017. Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. Journal of King Saud University-Computer and Information Sciences 29, 2 (2017), 189--195.Google Scholar
Cross Ref
- Fawaz S. Al-Anzi and Dia AbuZeina. 2018. Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach. Information Processing 8 Management 54, 1 (2018), 105--115.Google Scholar
- Mahmoud Al-Ayyoub, Aya Nuseir, Kholoud Alsmearat, Yaser Jararweh, and Brij Gupta. 2018. Deep learning for Arabic NLP: A survey. Journal of Computational Science 26 (2018), 522--531.Google Scholar
Cross Ref
- Sadam Al-Azani and El-Sayed M. El-Alfy. 2017. Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In International Conference on Neural Information Processing. Springer, 491--500.Google Scholar
- Sami Al-Harbi, Abdulrahman Almuhareb, Abdulmohsen Al-Thubaity, Mohammed Khorsheed, and Abdullah Al-Rajeh. 2008. Automatic Arabic text classification. In Proceedings of the 9th International Conference on Statistical Analysis of Textual Data. 77--83.Google Scholar
- Mohammad Al-Smadi, Omar Qawasmeh, Mahmoud Al-Ayyoub, Yaser Jararweh, and Brij Gupta. 2018. Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science 27 (2018), 386--393. https://doi.org/10.1016/j.jocs.2017.11.006.Google Scholar
Cross Ref
- Mayy M. Al-Tahrawi and Sumaya N. Al-Khatib. 2015. Arabic text classification using polynomial networks. Journal of King Saud University-Computer and Information Sciences 27, 4 (2015), 437--449.Google Scholar
Digital Library
- Mohamed Seghir Hadj Ameur, Ahmed Guessoum, and Farid Meziane. 2019. Improving Arabic neural machine translation via n-best list re-ranking. Machine Translation 33, 4 (2019), 279--314. DOI:https://doi.org/10.1007/s10590-019-09237-6Google Scholar
Cross Ref
- Mohamed Seghir Hadj Ameur, Farid Meziane, and Ahmed Guessoum. 2017. Arabic machine transliteration using an attention-based encoder-decoder model. In 3rd International Conference on Arabic Computational Linguistics, ACLING 2017, November 5-6, 2017, Dubai, United Arab Emirates. 287--297. DOI:https://doi.org/10.1016/j.procs.2017.10.120Google Scholar
Cross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7-9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1409.0473.Google Scholar
- Riadh Belkebir and Ahmed Guessoum. 2013. A hybrid BSO-Chi2-SVM approach to Arabic text categorization. In 2013 ACS International Conference on Computer Systems and Applications (AICCSA). 1--7. DOI:https://doi.org/10.1109/AICCSA.2013.6616437Google Scholar
Cross Ref
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. TACL 5 (2017), 135--146. https://transacl.org/ojs/index.php/tacl/article/view/999.Google Scholar
Cross Ref
- Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In [email protected] 2014, 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014. 103--111. DOI:https://doi.org/10.3115/v1/W14-4012Google Scholar
- François Chollet et al. 2015. Keras. Retrieved from https://keras.io.Google Scholar
- Sumit Chopra, Michael Auli, and Alexander M. Rush. 2016. Abstractive sentence summarization with attentive recurrent neural networks. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 93--98.Google Scholar
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.Google Scholar
Digital Library
- Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Vol. 1. 1107--1116.Google Scholar
Cross Ref
- Mohamed El Kourdi, Amine Bensaid, and Tajje-eddine Rachidi. 2004. Automatic Arabic document categorization based on the Naïve Bayes algorithm. In Workshop on Computational Approaches to Arabic Script-based Languages. Association for Computational Linguistics, 51--58.Google Scholar
Cross Ref
- Christoph Goller and Andreas Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, 1996, Vol. 1. IEEE, 347--352.Google Scholar
- Chinnappa Guggilla. 2016. Discrimination between similar languages, varieties and dialects using CNN- and LSTM-based deep neural networks. In 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). 185--194.Google Scholar
- Mohamed Seghir Hadj Ameur, Youcef Moulahoum, and Ahmed Guessoum. 2015. Restoration of Arabic diacritics using a multilevel statistical model. In Computer Science and Its Applications, Abdelmalek Amine (Ed.). Springer International Publishing, Cham, 181--192. DOI:https://doi.org/10.1007/978-3-319-19578-0_15Google Scholar
- Fouzi Harrag, Eyas El-Qawasmah, and Abdul Malik S. Al-Salman. 2011. Stemming as a feature reduction technique for arabic text categorization. In 10th International Symposium on Programming and Systems (ISPS’11). IEEE, 128--133.Google Scholar
Cross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.Google Scholar
Digital Library
- Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers. 328--339. DOI:https://doi.org/10.18653/v1/P18-1031Google Scholar
Cross Ref
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In 32nd International Conference on International Conference on Machine Learning—Volume 37 (ICML’15). JMLR.org, 448--456. http://dl.acm.org/citation.cfm?id=3045118.3045167.Google Scholar
- Vasu Jindal. 2016. A personalized Markov clustering and deep learning approach for Arabic text categorization. In ACL 2016 Student Research Workshop. 145--151.Google Scholar
Cross Ref
- Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, May 31-- June 5, 2015. 103--112. https://www.aclweb.org/anthology/N15-1011/.Google Scholar
Cross Ref
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of the ACL. 1746--1751. https://www.aclweb.org/anthology/D14-1181/.Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7-9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1412.6980.Google Scholar
- Kanako Komiya and Hiroyuki Shinnou. 2018. Investigating effective parameters for fine-tuning of word embeddings using only a small corpus. In Workshop on Deep Learning Approaches for Low-Resource NLP. 60--67.Google Scholar
Cross Ref
- Ji Young Lee, Franck Dernoncourt, and Peter Szolovits. 2018. Transfer learning for named-entity recognition with neural networks. In 11thInternational Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018. http://www.lrec-conf.org/proceedings/lrec2018/summaries/878.html.Google Scholar
- Edward Loper and Steven Bird. 2002. NLTK: The natural language toolkit. In ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (ETMTNLP’02). Association for Computational Linguistics, Stroudsburg, PA, 63--70. DOI:https://doi.org/10.3115/1118108.1118117Google Scholar
Digital Library
- Marc Moreno Lopez and Jugal Kalita. 2017. Deep learning applied to NLP. CoRR abs/1703.03091 (2017). arxiv:1703.03091 http://arxiv.org/abs/1703.03091.Google Scholar
- Prem Melville, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1275--1284.Google Scholar
- Abdelwadood Mesleh and Ghassan Kanaan. 2008. Support vector machine text classification system: Using ant colony optimization based feature subset selection. In 2008 International Conference on Computer Engineering Systems. 143--148. DOI:https://doi.org/10.1109/ICCES.2008.4772984Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings. http://arxiv.org/abs/1301.3781.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google Scholar
- Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In 27th International Conference on Machine Learning (ICML-10). 807--814.Google Scholar
- Dong Nguyen and A. Seza Doğruöz. 2013. Word level language identification in online multilingual communication. In 2013 Conference on Empirical Methods in Natural Language Processing. 857--862.Google Scholar
- Motaz K. Saad and Wesam Ashour. 2010. OSAC: Open source Arabic Corpora. In 6th International Conference on Electrical and Computer Systems, Vol. 10.Google Scholar
- Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. Artificial Neural Networks and Machine Learning—ICANN (2010), 92--101.Google Scholar
- Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In 2015 Conference on Empirical Methods in Natural Language Processing. 298--307.Google Scholar
Cross Ref
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673--2681.Google Scholar
Digital Library
- David Sculley and Gabriel M. Wachman. 2007. Relaxed online SVMs for spam filtering. In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 415--422.Google Scholar
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google Scholar
Digital Library
- Baoxin Wang. 2018. Disconnected recurrent neural networks for text categorization. In 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2311--2320.Google Scholar
Cross Ref
- Peilu Wang, Yao Qian, Frank K. Soong, Lei He, and Hai Zhao. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. CoRR abs/1510.06168 (2015). arxiv:1510.06168 http://arxiv.org/abs/1510.06168.Google Scholar
- Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 2 (1989), 270--280.Google Scholar
Digital Library
- Edwin B. Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.Google Scholar
- Bilal Zahran and Ghassan Kanaan. 2009. Text feature selection using particle swarm optimization Algorithm. World Applied Sciences Journal 7 (2009), 69--74.Google Scholar
- Dejun Zhang, Long Tian, Mingbo Hong, Fei Han, Yafeng Ren, and Yilin Chen. 2018. Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification. IEEE Access 6 (2018), 73750--73759. DOI:https://doi.org/10.1109/ACCESS.2018.2882878Google Scholar
Cross Ref
- Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis C. M. Lau. 2015. A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015). arxiv:1511.08630 http://arxiv.org/abs/1511.08630.Google Scholar
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Evaluation in Information Retrieval. Cambridge University Press, 139--161. https://doi.org/10.1017/CBO9780511809071.009.Google Scholar
Index Terms
Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks
Recommendations
Automatic diacritization of Arabic text using recurrent neural networks
This paper presents a sequence transcription approach for the automatic diacritization of Arabic text. A recurrent neural network is trained to transcribe undiacritized Arabic text with fully diacritized sentences. We use a deep bidirectional long short-...
New Results for Prediction of Chaotic Systems Using Deep Recurrent Neural Networks
AbstractPrediction of nonlinear and dynamic systems is a challenging task, however with the aid of machine learning techniques, particularly neural networks, is now possible to accomplish this objective. Most common neural networks used are the multilayer ...
Deep Learning for French Legal Data Categorization
Model and Data EngineeringAbstractIn current years, deep learning has showed promising results when used in the field of natural language processing (NLP). Neural Networks (NNs) such as convolutional neural network (CNN) and recurrent neural network (RNN) have been utilized for ...






Comments