Abstract
Arabic is recognized as one of the main languages around the world. Many attempts and efforts have been done to provide computing solutions to support the language. Developing Arabic chatbots is still an evolving research field and requires extra efforts due to the nature of the language. One of the common tasks of any natural language processing application is the stemming step. It is important for developing chatbots, since it helps with pre-processing the input data and it can be involved with different phases of the chatbot development process. The aim of this article is to combine a scoring approach with Arabic stemming techniques for developing an Arabic chatbot conversation engine. Two experiments are conducted to evaluate the proposed solution. The first experiment is to select which stemmer is more accurate when applying our solution, since our algorithm can support various stemmers. The second experiment was conducted to evaluate our proposed approach against various machine learning models. The results show that the ISRIS stemming algorithm is the best fit for our solution with accuracy 78.06%. The results also indicate that our novel solution achieved an F1 score of 65.5%, while the other machine learning models achieved slightly lower scores. Our study presents a novel technique by combining scoring mechanisms with stemming processes to produce the best answer for every query sent by chatbots users compared to other approaches. This can be helpful for developing Arabic chatbot and can support many domains such as education, business, and health. This technique is among the first techniques that developed purposefully to serve the development of Arabic chatbots conversation engine.
- [1] . 2015. Parsing modern standard Arabic using Treebank resources. In International Conference on Information and Communication Technology Research (ICTRC). IEEE, 80–83.Google Scholar
Cross Ref
- [2] . 2018. Support of existing chatbot development framework for Arabic language: A brief survey. In 5th International Symposium on Data Mining Applications. Springer, 26–35.Google Scholar
- [3] . 2015. A novel root based Arabic stemmer. J. King Saud Univ.-Comput. Inf. Sci. 27, 2 (2015), 94–103. Google Scholar
Digital Library
- [4] . 2021. An intelligent Arabic chatbot system proposed framework. In International Conference on Information Technology (ICIT). IEEE, 592–597.Google Scholar
Cross Ref
- [5] . 2015. Human annotated Arabic dataset of book reviews for aspect based sentiment analysis. In 3rd International Conference on Future Internet of Things and Cloud. IEEE, 726–730. Google Scholar
Digital Library
- [6] . 2019. A study of the effects of stemming strategies on Arabic document classification. IEEE Access 7 (2019), 32664–32671.Google Scholar
Cross Ref
- [7] . 2015. Arabic opinion target extraction from tweets. ARPN J. Eng. Appl. Sci. 10, 3 (2015), 1023–1026.Google Scholar
- [8] . 2018. Arabic chatbots: A survey. Int. J. Adv. Comput. Sci. Applic. 9, 8 (2018), 535–541.Google Scholar
Cross Ref
- [9] . 2016. Botta: An Arabic dialect chatbot. In 26th International Conference on Computational Linguistics: System Demonstrations. 208–212.Google Scholar
- [10] . 2017. Development of an Arabic conversational intelligent tutoring system for education of children with ASD. In IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA). IEEE, 24–29.Google Scholar
Cross Ref
- [11] . 2014. Improving Arabic Light Stemming in Information Retrieval Systems. Islamic University in Gaza. https://iugspace.iugaza.edu.ps/bitstream/handle/20.500.12358/18964/file_1.pdf?sequence=1&isAllowed=y.Google Scholar
- [12] . 2017. Qutuf: An Arabic Morphological Analyzer. Retrieved from https://github.com/Qutuf/qutuf.Google Scholar
- [13] . 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. Google Scholar
Digital Library
- [14] . 2004. Large scale online learning. In Advances in Neural Information Processing Systems 16 (NIPS 2003), , , and (Eds.). The MIT Press, Cambridge, MA. Retrieved from http://leon.bottou.org/papers/bottou-lecun-2004. Google Scholar
Digital Library
- [15] . 2018. Sentiment analysis in Arabic: A review of the literature. Ain Shams Eng. J. 9, 4 (2018), 2479–2490.Google Scholar
Cross Ref
- [16] . 2020. Spline functions for Arabic morphological disambiguation. Appl. Comput. Inform. 16, 1 (2020).Google Scholar
Cross Ref
- [17] . 2017. AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29, 2 (2017), 141–146. Google Scholar
Digital Library
- [18] . 2010. Alkhalil Morpho Sys1: A morphosyntactic analysis system for Arabic texts. In International Arab Conference on Information Technology. Elsevier Science Inc New York, NY, 1–6.Google Scholar
- [19] . 2001. Random forests. Mach. Learn. 45, 1 (2001), 5–32. Google Scholar
Digital Library
- [20] . 1984. Classification and Regression Trees. CRC Press.Google Scholar
- [21] . 2004. Issues in Arabic orthography and morphology analysis. In Workshop on Computational Approaches to Arabic Script-based Languages. 31–34. Google Scholar
Digital Library
- [22] . 2012. A semantic architecture for artificial conversations. In 6th International Conference on Soft Computing and Intelligent Systems and the 13th International Symposium on Advanced Intelligence Systems. IEEE, 21–26.Google Scholar
Cross Ref
- [23] . 2015. A comparative study on Arabic stemmers. Int. J. Comput. Applic. 125, 8 (2015).Google Scholar
- [24] . 2002. Building a shallow Arabic morphological analyser in one day. In ACL-02 Workshop on Computational Approaches to Semitic Languages. Google Scholar
Digital Library
- [25] . 2003. CLIR Experiments at Maryland for TREC-2002: Evidence Combination for Arabic-English Retrieval.
Technical Report . Maryland University College Park Institute for Advanced Computer Studies.Google ScholarCross Ref
- [26] . 2015. CBAS: Context based Arabic stemmer. arXiv preprint arXiv:1611.00027 (2015).Google Scholar
- [27] . 2017. Arabic Processing Cog. Retrieved from https://github.com/disooqi/ArabicProcessingCog.Google Scholar
- [28] . 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8, 4 (2009), 1–22. Google Scholar
Digital Library
- [29] . 2001. The Elements of Statistical Learning. Vol. 1. Springer Series in Statistics, New York.Google Scholar
- [30] . 2009. Enhanced algorithm for extracting the root of Arabic words. In 6th International Conference on Computer Graphics, Imaging and Visualization. IEEE, 388–391. Google Scholar
Digital Library
- [31] . 2010. Understanding the difficulty of training deep feedforward neural networks. In 13th International Conference on Artificial Intelligence and Statistics. 249–256.Google Scholar
- [32] . 2019. Arabic natural language processing: An overview. J. King Saud Univ.-Comput. Inf. Sci. 33, 5 (2019).Google Scholar
- [33] . 2014. ArabChat: An Arabic conversational agent. In 6th International Conference on Computer Science and Information Technology (CSIT). IEEE, 227–237.Google Scholar
Cross Ref
- [34] . 1999. Stemming Arabic text. Lancaster, UK, Computing Department, Lancaster University.Google Scholar
- [35] . 2011. From chatbots to dialog systems. In Conversational Agents and Natural Language Interaction: Techniques and Effective Practices. IGI Global, 1–22.Google Scholar
Cross Ref
- [36] . 2014. Arabic words stemming approach using Arabic WordNet. Int. J. Data Mining Knowl. Manag. Process 4, 6 (2014), 1.Google Scholar
Cross Ref
- [37] . 2007. Light stemming for Arabic information retrieval. In Arabic Computational Morphology. Springer, 221–243.Google Scholar
Cross Ref
- [38] . 2012. Efficient backprop. In Neural Networks: Tricks of the Trade. Springer, 9–48. Google Scholar
Digital Library
- [39] . 2010. Dialogue patterns of an Arabic robot receptionist. In 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 167–168. Google Scholar
Digital Library
- [40] . 2017. A review of technologies for conversational systems. In International Conference on Computer Science, Applied Mathematics and Applications. Springer, 212–225.Google Scholar
- [41] . 2017. A comparative survey on Arabic stemming: Approaches and challenges. Intell. Inf. Manag. 9, 02 (2017), 39.Google Scholar
- [42] . 2020. pandas-dev/pandas: Pandas.
DOI: https://doi.org/10.5281/zenodo.3509134Google Scholar - [43] . 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830. Google Scholar
Digital Library
- [44] . 2003. Gaussian processes in machine learning. In Summer School on Machine Learning. Springer, 63–71.Google Scholar
- [45] . 2019. Automation accuracy is good, but high controllability may be better. In CHI Conference on Human Factors in Computing Systems. 1–8. Google Scholar
Digital Library
- [46] . 2015. Lucene Arabic Analyzer. Retrieved from https://github.com/msarhan/lucene-arabic-analyzer.Google Scholar
- [47] . 2004. An Arabic chatbot giving answers from the Qur’an. In Conference sur le Traitement Automatique des Langues Naturelles, Vol. 2. ATALA, 197–202.Google Scholar
- [48] . 2005. Arabic stemming without a root dictionary. In International Conference on Information Technology: Coding and Computing (ITCC’05). IEEE, 152–157. Google Scholar
Digital Library
- [49] . 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6 (2001), 520–525.Google Scholar
Cross Ref
- [50] . 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, Aug. (2004), 975–1005. Google Scholar
Digital Library
- [51] . 2018. Response selection with topic clues for retrieval-based chatbots. Neurocomputing 316 (2018), 251–261.Google Scholar
Cross Ref
- [52] . 2015. Arabic Stemming Toolkit. Retrieved from https://github.com/mhmdio/Arabic-Stemming-Toolkit.Google Scholar
- [53] . 2017. Tashaphyne. Retrieved from https://github.com/linuxscout/tashaphyne.Google Scholar
- [54] . 2005. Exploring conditions for the optimality of naive Bayes. Int. J. Pattern Recog. Artif. Intell. 19, 02 (2005), 183–198.Google Scholar
Cross Ref
Index Terms
Combining a Novel Scoring Approach with Arabic Stemming Techniques for Arabic Chatbots Conversation Engine
Recommendations
A novel Arabic lemmatization algorithm
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataTokenization is a fundamental step in processing textual data preceding the tasks of information retrieval, text mining, and natural language processing. Tokenization is a language-dependent approach, including normalization, stop words removal, ...
A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics
AbstractWord Stemming is a widely used mechanism in the fields of Natural Language Processing, Information Retrieval, and Language Modeling. Language-independent stemmers discover classes of morphologically related words from the ambient ...
Towards an error-free Arabic stemming
iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searchingStemming is a computational process for reducing words to their roots (or stems). It can be classified as a recall-enhancing or precision-enhancing component.
Existing Arabic stemmers suffer from high stemming error-rates. Arabic stemmers blindly stem ...






Comments